FlatBuffers

FlatBuffers is an efficient cross-platform serialization library for C++, C#, C, Go, Java, Kotlin, JavaScript, Lobster, Lua, TypeScript, PHP, Python, Rust, and Swift. It was originally created at Google for game development and other performance-critical applications.

It is available as Open Source on GitHub under the Apache license, v2 (see LICENSE.txt).

Why use FlatBuffers?

  • Access to serialized data without parsing/unpacking – What sets FlatBuffers apart is that it represents hierarchical data in a flat binary buffer in such a way that it can still be accessed directly without parsing/unpacking, while also still supporting data structure evolution (forwards/backwards compatibility).
  • Memory efficiency and speed – The only memory needed to access your data is that of the buffer. It requires 0 additional allocations (in C++, other languages may vary). FlatBuffers is also very suitable for use with mmap (or streaming), requiring only part of the buffer to be in memory. Access is close to the speed of raw struct access with only one extra indirection (a kind of vtable) to allow for format evolution and optional fields. It is aimed at projects where spending time and space (many memory allocations) to be able to access or construct serialized data is undesirable, such as in games or any other performance-sensitive applications. See the benchmarks for details.
  • Flexible – Optional fields means not only do you get great forward and backward compatibility (increasingly important for long-lived games: don’t have to update all data with each new version!). It also means you have a lot of choice in what data you write and what data you don’t, and how you design data structures.
  • Tiny code footprint – Small amounts of generated code, and just a single small header as the minimum dependency, which is very easy to integrate. Again, see the benchmark section for details.
  • Strongly typed – Errors happen at compile time rather than manually having to write repetitive and error prone run-time checks. Useful code can be generated for you.
  • Convenient to use – Generated C++ code allows for terse access & construction code. Then there’s optional functionality for parsing schemas and JSON-like text representations at runtime efficiently if needed (faster and more memory efficient than other JSON parsers).

    Java, Kotlin and Go code supports object reuse. C# has efficient struct based accessors.

  • Cross platform code with no dependencies – C++ code will work with any recent gcc/clang and VS2010. Comes with build files for the tests & samples (Android .mk files, and cmake for all other platforms).

Why not use Protocol Buffers, or .. ?

Protocol Buffers is indeed relatively similar to FlatBuffers, with the primary difference being that FlatBuffers does not need a parsing/ unpacking step to a secondary representation before you can access data, often coupled with per-object memory allocation. The code is an order of magnitude bigger, too. Protocol Buffers has no optional text import/export.

But all the cool kids use JSON!

JSON is very readable (which is why we use it as our optional text format) and very convenient when used together with dynamically typed languages (such as JavaScript). When serializing data from statically typed languages, however, JSON not only has the obvious drawback of runtime inefficiency, but also forces you to write more code to access data (counterintuitively) due to its dynamic-typing serialization system. In this context, it is only a better choice for systems that have very little to no information ahead of time about what data needs to be stored.

If you do need to store data that doesn’t fit a schema, FlatBuffers also offers a schema-less (self-describing) version!

Read more about the “why” of FlatBuffers in the white paper.

Who uses FlatBuffers?

  • Cocos2d-x, the #1 open source mobile game engine, uses it to serialize all their game data.
  • Facebook uses it for client-server communication in their Android app. They have a nice article explaining how it speeds up loading their posts.
  • Fun Propulsion Labs at Google uses it extensively in all their libraries and games.

Usage in brief

This section is a quick rundown of how to use this system. Subsequent sections provide a more in-depth usage guide.

  • Write a schema file that allows you to define the data structures you may want to serialize. Fields can have a scalar type (ints/floats of all sizes), or they can be a: string; array of any type; reference to yet another object; or, a set of possible objects (unions). Fields are optional and have defaults, so they don’t need to be present for every object instance.
  • Use flatc (the FlatBuffer compiler) to generate a C++ header (or Java/Kotlin/C#/Go/Python.. classes) with helper classes to access and construct serialized data. This header (say mydata_generated.h) only depends on flatbuffers.h, which defines the core functionality.
  • Use the FlatBufferBuilder class to construct a flat binary buffer. The generated functions allow you to add objects to this buffer recursively, often as simply as making a single function call.
  • Store or send your buffer somewhere!
  • When reading it back, you can obtain the pointer to the root object from the binary buffer, and from there traverse it conveniently in-place with object->field().

C++ Benchmarks

Comparing against other serialization solutions, running on Windows 7 64bit. We use the LITE runtime for Protocol Buffers (less code / lower overhead), Rapid JSON (one of the fastest C++ JSON parsers around), and pugixml, also one of the fastest XML parsers.

We also compare against code that doesn’t use a serialization library at all (the column “Raw structs”), which is what you get if you write hardcoded code that just writes structs. This is the fastest possible, but of course is not cross platform nor has any kind of forwards / backwards compatibility.

We compare against Flatbuffers with the binary wire format (as intended), and also with JSON as the wire format with the optional JSON parser (which, using a schema, parses JSON into a binary buffer that can then be accessed as before).

The benchmark object is a set of about 10 objects containing an array, 4 strings, and a large variety of int/float scalar values of all sizes, meant to be representative of game data, e.g. a scene format.

FlatBuffers (binary) Protocol Buffers LITE Rapid JSON FlatBuffers (JSON) pugixml Raw structs
Decode + Traverse + Dealloc (1 million times, seconds) 0.08 302 583 105 196 0.02
Decode / Traverse / Dealloc (breakdown) 0 / 0.08 / 0 220 / 0.15 / 81 294 / 0.9 / 287 70 / 0.08 / 35 41 / 3.9 / 150 0 / 0.02 / 0
Encode (1 million times, seconds) 3.2 185 650 169 273 0.15
Wire format size (normal / zlib, bytes) 344 / 220 228 / 174 1475 / 322 1029 / 298 1137 / 341 312 / 187
Memory needed to store decoded wire (bytes / blocks) 0 / 0 760 / 20 65689 / 4 328 / 1 34194 / 3 0 / 0
Transient memory allocated during decode (KB) 0 1 131 4 34 0
Generated source code size (KB) 4 61 0 4 0 0
Field access in handwritten traversal code typed accessors typed accessors manual error checking typed accessors manual error checking typed but no safety
Library source code (KB) 15 some subset of 3800 87 43 327 0

Some other serialization systems we compared against but did not benchmark (yet), in rough order of applicability:

  • Cap’n’Proto promises to reduce Protocol Buffers much like FlatBuffers does, though with a more complicated binary encoding and less flexibility (no optional fields to allow deprecating fields or serializing with missing fields for which defaults exist). It currently also isn’t fully cross-platform portable (lack of VS support).
  • msgpack: has very minimal forwards/backwards compatibility support when used with the typed C++ interface. Also lacks VS2010 support.
  • Thrift: very similar to Protocol Buffers, but appears to be less efficient, and have more dependencies.
  • YAML: a superset of JSON and otherwise very similar. Used by e.g. Unity.
  • C# comes with built-in serialization functionality, as used by Unity also. Being tied to the language, and having no automatic versioning support limits its applicability.
  • Project Anarchy (the free mobile engine by Havok) comes with a serialization system, that however does no automatic versioning (have to code around new fields manually), is very much tied to the rest of the engine, and works without a schema to generate code (tied to your C++ class definition).

Code for benchmarks

Code for these benchmarks sits in benchmarks/ in git branch benchmarks. It sits in its own branch because it has submodule dependencies that the main project doesn’t need, and the code standards do not meet those of the main project. Please read benchmarks/cpp/README.txt before working with the code.

FlatBuffers white paper

This document tries to shed some light on to the “why” of FlatBuffers, a new serialization library.

Motivation

Back in the good old days, performance was all about instructions and cycles. Nowadays, processing units have run so far ahead of the memory subsystem, that making an efficient application should start and finish with thinking about memory. How much you use of it. How you lay it out and access it. How you allocate it. When you copy it.

Serialization is a pervasive activity in a lot programs, and a common source of memory inefficiency, with lots of temporary data structures needed to parse and represent data, and inefficient allocation patterns and locality.

If it would be possible to do serialization with no temporary objects, no additional allocation, no copying, and good locality, this could be of great value. The reason serialization systems usually don’t manage this is because it goes counter to forwards/backwards compatability, and platform specifics like endianness and alignment.

FlatBuffers is what you get if you try anyway.

In particular, FlatBuffers focus is on mobile hardware (where memory size and memory bandwidth is even more constrained than on desktop hardware), and applications that have the highest performance needs: games.

FlatBuffers

This is a summary of FlatBuffers functionality, with some rationale. A more detailed description can be found in the FlatBuffers documentation.

Summary

A FlatBuffer is a binary buffer containing nested objects (structs, tables, vectors,..) organized using offsets so that the data can be traversed in-place just like any pointer-based data structure. Unlike most in-memory data structures however, it uses strict rules of alignment and endianness (always little) to ensure these buffers are cross platform. Additionally, for objects that are tables, FlatBuffers provides forwards/backwards compatibility and general optionality of fields, to support most forms of format evolution.

You define your object types in a schema, which can then be compiled to C++ or Java for low to zero overhead reading & writing. Optionally, JSON data can be dynamically parsed into buffers.

Tables

Tables are the cornerstone of FlatBuffers, since format evolution is essential for most applications of serialization. Typically, dealing with format changes is something that can be done transparently during the parsing process of most serialization solutions out there. But a FlatBuffer isn’t parsed before it is accessed.

Tables get around this by using an extra indirection to access fields, through a vtable. Each table comes with a vtable (which may be shared between multiple tables with the same layout), and contains information where fields for this particular kind of instance of vtable are stored. The vtable may also indicate that the field is not present (because this FlatBuffer was written with an older version of the software, of simply because the information was not necessary for this instance, or deemed deprecated), in which case a default value is returned.

Tables have a low overhead in memory (since vtables are small and shared) and in access cost (an extra indirection), but provide great flexibility. Tables may even cost less memory than the equivalent struct, since fields do not need to be stored when they are equal to their default.

FlatBuffers additionally offers “naked” structs, which do not offer forwards/backwards compatibility, but can be even smaller (useful for very small objects that are unlikely to change, like e.g. a coordinate pair or a RGBA color).

Schemas

While schemas reduce some generality (you can’t just read any data without having its schema), they have a lot of upsides:

  • Most information about the format can be factored into the generated code, reducing memory needed to store data, and time to access it.
  • The strong typing of the data definitions means less error checking/handling at runtime (less can go wrong).
  • A schema enables us to access a buffer without parsing.

FlatBuffer schemas are fairly similar to those of the incumbent, Protocol Buffers, and generally should be readable to those familiar with the C family of languages. We chose to improve upon the features offered by .proto files in the following ways:

  • Deprecation of fields instead of manual field id assignment. Extending an object in a .proto means hunting for a free slot among the numbers (preferring lower numbers since they have a more compact representation). Besides being inconvenient, it also makes removing fields problematic: you either have to keep them, not making it obvious that this field shouldn’t be read/written anymore, and still generating accessors. Or you remove it, but now you risk that there’s still old data around that uses that field by the time someone reuses that field id, with nasty consequences.
  • Differentiating between tables and structs (see above). Effectively all table fields are optional, and all struct fields are required.
  • Having a native vector type instead of repeated. This gives you a length without having to collect all items, and in the case of scalars provides for a more compact representation, and one that guarantees adjacency.
  • Having a native union type instead of using a series of optional fields, all of which must be checked individually.
  • Being able to define defaults for all scalars, instead of having to deal with their optionality at each access.
  • A parser that can deal with both schemas and data definitions (JSON compatible) uniformly.

FlatBuffer Internals

This section is entirely optional for the use of FlatBuffers. In normal usage, you should never need the information contained herein. If you’re interested however, it should give you more of an appreciation of why FlatBuffers is both efficient and convenient.

Format components

A FlatBuffer is a binary file and in-memory format consisting mostly of scalars of various sizes, all aligned to their own size. Each scalar is also always represented in little-endian format, as this corresponds to all commonly used CPUs today. FlatBuffers will also work on big-endian machines, but will be slightly slower because of additional byte-swap intrinsics.

It is assumed that the following conditions are met, to ensure cross-platform interoperability:

  • The binary IEEE-754 format is used for floating-point numbers.
  • The two's complemented representation is used for signed integers.
  • The endianness is the same for floating-point numbers as for integers.

On purpose, the format leaves a lot of details about where exactly things live in memory undefined, e.g. fields in a table can have any order, and objects to some extent can be stored in many orders. This is because the format doesn’t need this information to be efficient, and it leaves room for optimization and extension (for example, fields can be packed in a way that is most compact). Instead, the format is defined in terms of offsets and adjacency only. This may mean two different implementations may produce different binaries given the same input values, and this is perfectly valid.

Format identification

The format also doesn’t contain information for format identification and versioning, which is also by design. FlatBuffers is a statically typed system, meaning the user of a buffer needs to know what kind of buffer it is. FlatBuffers can of course be wrapped inside other containers where needed, or you can use its union feature to dynamically identify multiple possible sub-objects stored. Additionally, it can be used together with the schema parser if full reflective capabilities are desired.

Versioning is something that is intrinsically part of the format (the optionality / extensibility of fields), so the format itself does not need a version number (it’s a meta-format, in a sense). We’re hoping that this format can accommodate all data needed. If format breaking changes are ever necessary, it would become a new kind of format rather than just a variation.

Offsets

The most important and generic offset type (see flatbuffers.h) is uoffset_t, which is currently always a uint32_t, and is used to refer to all tables/unions/strings/vectors (these are never stored in-line). 32bit is intentional, since we want to keep the format binary compatible between 32 and 64bit systems, and a 64bit offset would bloat the size for almost all uses. A version of this format with 64bit (or 16bit) offsets is easy to set when needed. Unsigned means they can only point in one direction, which typically is forward (towards a higher memory location). Any backwards offsets will be explicitly marked as such.

The format starts with an uoffset_t to the root table in the buffer.

We have two kinds of objects, structs and tables.

Structs

These are the simplest, and as mentioned, intended for simple data that benefits from being extra efficient and doesn’t need versioning / extensibility. They are always stored inline in their parent (a struct, table, or vector) for maximum compactness. Structs define a consistent memory layout where all components are aligned to their size, and structs aligned to their largest scalar member. This is done independent of the alignment rules of the underlying compiler to guarantee a cross platform compatible layout. This layout is then enforced in the generated code.

Tables

Unlike structs, these are not stored in inline in their parent, but are referred to by offset.

They start with an soffset_t to a vtable. This is a signed version of uoffset_t, since vtables may be stored anywhere relative to the object. This offset is substracted (not added) from the object start to arrive at the vtable start. This offset is followed by all the fields as aligned scalars (or offsets). Unlike structs, not all fields need to be present. There is no set order and layout. A table may contain field offsets that point to the same value if the user explicitly serializes the same offset twice.

To be able to access fields regardless of these uncertainties, we go through a vtable of offsets. Vtables are shared between any objects that happen to have the same vtable values.

The elements of a vtable are all of type voffset_t, which is a uint16_t. The first element is the size of the vtable in bytes, including the size element. The second one is the size of the object, in bytes (including the vtable offset). This size could be used for streaming, to know how many bytes to read to be able to access all inline fields of the object. The remaining elements are the N offsets, where N is the amount of fields declared in the schema when the code that constructed this buffer was compiled (thus, the size of the table is N + 2).

All accessor functions in the generated code for tables contain the offset into this table as a constant. This offset is checked against the first field (the number of elements), to protect against newer code reading older data. If this offset is out of range, or the vtable entry is 0, that means the field is not present in this object, and the default value is return. Otherwise, the entry is used as offset to the field to be read.

Unions

Unions are encoded as the combination of two fields: an enum representing the union choice and the offset to the actual element. FlatBuffers reserves the enumeration constant NONE (encoded as 0) to mean that the union field is not set.

Strings and Vectors

Strings are simply a vector of bytes, and are always null-terminated. Vectors are stored as contiguous aligned scalar elements prefixed by a 32bit element count (not including any null termination). Neither is stored inline in their parent, but are referred to by offset. A vector may consist of more than one offset pointing to the same value if the user explicitly serializes the same offset twice.

Construction

The current implementation constructs these buffers backwards (starting at the highest memory address of the buffer), since that significantly reduces the amount of bookkeeping and simplifies the construction API.

Code example

Here’s an example of the code that gets generated for the samples/monster.fbs. What follows is the entire file, broken up by comments:

// automatically generated, do not modify

#include "flatbuffers/flatbuffers.h"

namespace MyGame {
namespace Sample {

Nested namespace support.

enum {
  Color_Red = 0,
  Color_Green = 1,
  Color_Blue = 2,
};

inline const char **EnumNamesColor() {
  static const char *names[] = { "Red", "Green", "Blue", nullptr };
  return names;
}

inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; }

Enums and convenient reverse lookup.

enum {
  Any_NONE = 0,
  Any_Monster = 1,
};

inline const char **EnumNamesAny() {
  static const char *names[] = { "NONE", "Monster", nullptr };
  return names;
}

inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; }

Unions share a lot with enums.

struct Vec3;
struct Monster;

Predeclare all data types since circular references between types are allowed (circular references between object are not, though).

FLATBUFFERS_MANUALLY_ALIGNED_STRUCT(4) Vec3 {
 private:
  float x_;
  float y_;
  float z_;

 public:
  Vec3(float x, float y, float z)
    : x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {}

  float x() const { return flatbuffers::EndianScalar(x_); }
  float y() const { return flatbuffers::EndianScalar(y_); }
  float z() const { return flatbuffers::EndianScalar(z_); }
};
FLATBUFFERS_STRUCT_END(Vec3, 12);

These ugly macros do a couple of things: they turn off any padding the compiler might normally do, since we add padding manually (though none in this example), and they enforce alignment chosen by FlatBuffers. This ensures the layout of this struct will look the same regardless of compiler and platform. Note that the fields are private: this is because these store little endian scalars regardless of platform (since this is part of the serialized data). EndianScalar then converts back and forth, which is a no-op on all current mobile and desktop platforms, and a single machine instruction on the few remaining big endian platforms.

struct Monster : private flatbuffers::Table {
  const Vec3 *pos() const { return GetStruct<const Vec3 *>(4); }
  int16_t mana() const { return GetField<int16_t>(6, 150); }
  int16_t hp() const { return GetField<int16_t>(8, 100); }
  const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); }
  const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); }
  int8_t color() const { return GetField<int8_t>(16, 2); }
};

Tables are a bit more complicated. A table accessor struct is used to point at the serialized data for a table, which always starts with an offset to its vtable. It derives from Table, which contains the GetField helper functions. GetField takes a vtable offset, and a default value. It will look in the vtable at that offset. If the offset is out of bounds (data from an older version) or the vtable entry is 0, the field is not present and the default is returned. Otherwise, it uses the entry as an offset into the table to locate the field.

struct MonsterBuilder {
  flatbuffers::FlatBufferBuilder &fbb_;
  flatbuffers::uoffset_t start_;
  void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); }
  void add_mana(int16_t mana) { fbb_.AddElement<int16_t>(6, mana, 150); }
  void add_hp(int16_t hp) { fbb_.AddElement<int16_t>(8, hp, 100); }
  void add_name(flatbuffers::Offset<flatbuffers::String> name) { fbb_.AddOffset(10, name); }
  void add_inventory(flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory) { fbb_.AddOffset(14, inventory); }
  void add_color(int8_t color) { fbb_.AddElement<int8_t>(16, color, 2); }
  MonsterBuilder(flatbuffers::FlatBufferBuilder &_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); }
  flatbuffers::Offset<Monster> Finish() { return flatbuffers::Offset<Monster>(fbb_.EndTable(start_, 7)); }
};

MonsterBuilder is the base helper struct to construct a table using a FlatBufferBuilder. You can add the fields in any order, and the Finish call will ensure the correct vtable gets generated.

inline flatbuffers::Offset<Monster> CreateMonster(flatbuffers::FlatBufferBuilder &_fbb,
                                                  const Vec3 *pos, int16_t mana,
                                                  int16_t hp,
                                                  flatbuffers::Offset<flatbuffers::String> name,
                                                  flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory,
                                                  int8_t color) {
  MonsterBuilder builder_(_fbb);
  builder_.add_inventory(inventory);
  builder_.add_name(name);
  builder_.add_pos(pos);
  builder_.add_hp(hp);
  builder_.add_mana(mana);
  builder_.add_color(color);
  return builder_.Finish();
}

CreateMonster is a convenience function that calls all functions in MonsterBuilder above for you. Note that if you pass values which are defaults as arguments, it will not actually construct that field, so you can probably use this function instead of the builder class in almost all cases.

inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<Monster>(buf); }

This function is only generated for the root table type, to be able to start traversing a FlatBuffer from a raw buffer pointer.

}; // namespace MyGame
}; // namespace Sample

Encoding example.

Below is a sample encoding for the following JSON corresponding to the above schema:

{ pos: { x: 1, y: 2, z: 3 }, name: "fred", hp: 50 }

Resulting in this binary buffer:

// Start of the buffer:
uint32_t 20  // Offset to the root table.

// Start of the vtable. Not shared in this example, but could be:
uint16_t 16 // Size of table, starting from here.
uint16_t 22 // Size of object inline data.
uint16_t 4, 0, 20, 16, 0, 0  // Offsets to fields from start of (root) table, 0 for not present.

// Start of the root table:
int32_t 16     // Offset to vtable used (default negative direction)
float 1, 2, 3  // the Vec3 struct, inline.
uint32_t 8     // Offset to the name string.
int16_t 50     // hp field.
int16_t 0      // Padding for alignment.

// Start of name string:
uint32_t 4  // Length of string.
int8_t 'f', 'r', 'e', 'd', 0, 0, 0, 0  // Text + 0 termination + padding.

Note that this not the only possible encoding, since the writer has some flexibility in which of the children of root object to write first (though in this case there’s only one string), and what order to write the fields in. Different orders may also cause different alignments to happen.

Additional reading.

The author of the C language implementation has made a similar document that may further help clarify the format.

FlexBuffers

The schema-less version of FlatBuffers have their own encoding, detailed here.

It shares many properties mentioned above, in that all data is accessed over offsets, all scalars are aligned to their own size, and all data is always stored in little endian format.

One difference is that FlexBuffers are built front to back, so children are stored before parents, and the root of the data starts at the last byte.

Another difference is that scalar data is stored with a variable number of bits (8/16/32/64). The current width is always determined by the parent, i.e. if the scalar sits in a vector, the vector determines the bit width for all elements at once. Selecting the minimum bit width for a particular vector is something the encoder does automatically and thus is typically of no concern to the user, though being aware of this feature (and not sticking a double in the same vector as a bunch of byte sized elements) is helpful for efficiency.

Unlike FlatBuffers there is only one kind of offset, and that is an unsigned integer indicating the number of bytes in a negative direction from the address of itself (where the offset is stored).

Vectors

The representation of the vector is at the core of how FlexBuffers works (since maps are really just a combination of 2 vectors), so it is worth starting there.

As mentioned, a vector is governed by a single bit width (supplied by its parent). This includes the size field. For example, a vector that stores the integer values 1, 2, 3 is encoded as follows:

uint8_t 3, 1, 2, 3, 4, 4, 4

The first 3 is the size field, and is placed before the vector (an offset from the parent to this vector points to the first element, not the size field, so the size field is effectively at index -1). Since this is an untyped vector SL_VECTOR, it is followed by 3 type bytes (one per element of the vector), which are always following the vector, and are always a uint8_t even if the vector is made up of bigger scalars.

A vector may include more than one offset pointing to the same value if the user explicitly serializes the same offset twice.

Types

A type byte is made up of 2 components (see flexbuffers.h for exact values):

  • 2 lower bits representing the bit-width of the child (8, 16, 32, 64). This is only used if the child is accessed over an offset, such as a child vector. It is ignored for inline types.
  • 6 bits representing the actual type (see flexbuffers.h).

Thus, in this example 4 means 8 bit child (value 0, unused, since the value is in-line), type SL_INT (value 1).

Typed Vectors

These are like the Vectors above, but omit the type bytes. The type is instead determined by the vector type supplied by the parent. Typed vectors are only available for a subset of types for which these savings can be significant, namely inline signed/unsigned integers (TYPE_VECTOR_INT / TYPE_VECTOR_UINT), floats (TYPE_VECTOR_FLOAT), and keys (TYPE_VECTOR_KEY, see below).

Additionally, for scalars, there are fixed length vectors of sizes 2 / 3 / 4 that don’t store the size (TYPE_VECTOR_INT2 etc.), for an additional savings in space when storing common vector or color data.

Scalars

FlexBuffers supports integers (TYPE_INT and TYPE_UINT) and floats (TYPE_FLOAT), available in the bit-widths mentioned above. They can be stored both inline and over an offset (TYPE_INDIRECT_*).

The offset version is useful to encode costly 64bit (or even 32bit) quantities into vectors / maps of smaller sizes, and to share / repeat a value multiple times.

Booleans and Nulls

Booleans (TYPE_BOOL) and nulls (TYPE_NULL) are encoded as inlined unsigned integers.

Blobs, Strings and Keys.

A blob (TYPE_BLOB) is encoded similar to a vector, with one difference: the elements are always uint8_t. The parent bit width only determines the width of the size field, allowing blobs to be large without the elements being large.

Strings (TYPE_STRING) are similar to blobs, except they have an additional 0 termination byte for convenience, and they MUST be UTF-8 encoded (since an accessor in a language that does not support pointers to UTF-8 data may have to convert them to a native string type).

A “Key” (TYPE_KEY) is similar to a string, but doesn’t store the size field. They’re so named because they are used with maps, which don’t care for the size, and can thus be even more compact. Unlike strings, keys cannot contain bytes of value 0 as part of their data (size can only be determined by strlen), so while you can use them outside the context of maps if you so desire, you’re usually better off with strings.

Maps

A map (TYPE_MAP) is like an (untyped) vector, but with 2 prefixes before the size field:

index field
-3 An offset to the keys vector (may be shared between tables).
-2 Byte width of the keys vector.
-1 Size (from here on it is compatible with TYPE_VECTOR)
0 Elements.
Size Types.

Since a map is otherwise the same as a vector, it can be iterated like a vector (which is probably faster than lookup by key).

The keys vector is a typed vector of keys. Both the keys and corresponding values have to be stored in sorted order (as determined by strcmp), such that lookups can be made using binary search.

The reason the key vector is a seperate structure from the value vector is such that it can be shared between multiple value vectors, and also to allow it to be treated as its own individual vector in code.

An example map { foo: 13, bar: 14 } would be encoded as:

0 : uint8_t 'b', 'a', 'r', 0
4 : uint8_t 'f', 'o', 'o', 0
8 : uint8_t 2      // key vector of size 2
// key vector offset points here
9 : uint8_t 9, 6   // offsets to bar_key and foo_key
11: uint8_t 2, 1   // offset to key vector, and its byte width
13: uint8_t 2      // value vector of size
// value vector offset points here
14: uint8_t 14, 13 // values
16: uint8_t 4, 4   // types

The root

As mentioned, the root starts at the end of the buffer. The last uint8_t is the width in bytes of the root (normally the parent determines the width, but the root has no parent). The uint8_t before this is the type of the root, and the bytes before that are the root value (of the number of bytes specified by the last byte).

So for example, the integer value 13 as root would be:

uint8_t 13, 4, 1    // Value, type, root byte width.

To Get Daily Health Newsletter

We don’t spam! Read our privacy policy for more info.

Download Mobile Apps
Follow us on Social Media
© 2012 - 2025; All rights reserved by authors. Powered by Mediarx International LTD, a subsidiary company of Rx Foundation.
RxHarun
Logo