Should I use X-buffers to Serialize Data?
X-buffer? what do you mean?, easy, Flat-buffers and Protocol-buffers (this is not official, it is just a way I decided to group both terms)
Every time a new technology or framework surges it is very common to think that is just better in all senses than the previous ones because is newer, however, it is important to understand what is the problem that aims to solve and if it is good enough in all contexts, that’s why I decided to write this post, in order to understand if the X-Buffers are really better than JSON.
In order to determine if using X-Buffers is a good idea, I will go through different scenarios doing a Golang Benchmark. This is the schema to be used to do the comparisons (full schema here).
type Customer struct {
FirstName string
LastName string
Age uint32
Balance float64
Debt float64
Preferences *Preferences
Friends []*Customer
Addresses map[string]*Location
}
If you are new with the Protocol Buffer or FlatBuffer, you can take a look to these resources before reading further.
- Protocol Buffer IDL guide and Intro to Protocol buffers
- Flat Buffer intro and Flat Buffer benchmarking
Experiments
Now let’s proceed with some benchmarking and to do that I will use 3 different levels of complexity so we can see how each serialization format behaves under different levels of stress:
- Level 1, The customer object with 1 address.
- Level 2, The customer object with 10 addresses, 10 friends and each friend contains 10 addresses.
- Level 3, The customer object with 100 addresses, 200 friends and each friend contains 100 addresses.
All test were run over a machine CPU Intel(R) Core(TM) i5–6267U CPU @ 2.90GHz with Golang 1.16.2, the results might vary in other languages or machines.
Level 1 Marshaling metrics
JSON 354464 3442 ns/op 1075 B/op 14 allocs/op
Proto 467773 2188 ns/op 1088 B/op 15 allocs/op
FBS 697542 1690 ns/op 1304 B/op 10 allocs/opSize in bytes
JSON:371, Proto:143, Fbs:296
I have to admit that the first time I run these tests I had to revisit all the implementation because previously I had the misconception that FlatBuffer was much better in all senses than the JSON and Protocol Buffer format, but as you can see that’s not the case for our simplest scenario.
Level 2 Marshaling metrics
JSON 6504 177418 ns/op 52443 B/op 619 allocs/op
Proto 9921 128793 ns/op 43157 B/op 717 allocs/op
FBS 10000 116492 ns/op 62473 B/op 326 allocs/opSize in KB
JSON:17, Proto:8.9, Fbs:13
In this scenario, JSON is not the winner anymore in any of the cases, and Protocol Buffer shortened the distance in time spent with FlatBuffer, as a conclusion Protocol Buffer looks like our best option for this case, now let’s contrast the behavior with a more extreme scenario.
Level 3 Marshaling metrics
JSON 39 29045645 ns/op 9810394 B/op 84209 allocs/op
Proto 55 21001219 ns/op 6334196 B/op 103901 allocs/op
FBS 78 13158922 ns/op 8333920 B/op 42052 allocs/opSize in MB
JSON:2.5, Proto:1.2, Fbs:2.0
Remember this time I used 100 addresses, 200 friends, and each friend has 100 addresses which is an uncommon use case but possible, I came across such big objects in a project once.
For this extreme case the worst in all senses is the JSON format, the FlatBuffer is now even faster than Proto Buffer compared last time and the trend about the size is consistent regarding the others tests.
Level 1 Unmarshalling stats
JSON 134371 10778 ns/op 1000 B/op 26 allocs/op
Proto 728599 1592 ns/op 733 B/op 13 allocs/op
FBS 510885903 2.365 ns/op 0 B/op 0 allocs/op
Yes, the picture is right, FlatBuffer does not have allocations trying to parse binary data into the object, also the time is almost 0 (2 ns), in the case of Protocol Buffer it takes 0.7 KB and JSON 1 KB in memory allocations, and regarding the time Protocol Buffer spent around 0.001 ms and JSON 0.01 ms, let’s see what we have next.
Level 2 Unmarshalling stats
JSON 2886 353497 ns/op 32025 B/op 1000 allocs/op
Proto 13471 91999 ns/op 31446 B/op 705 allocs/op
FBS 444298354 2.490 ns/op 0 B/op 0 allocs/op
The FlatBuffer is not affected by the size of the object nor the complexity and keeps the previous metrics, on the other hand, Protocol Buffer and JSON are close to each other in terms of memory allocation, Protocol Buffer had 31 KB and JSON 32 KB, however, in time spent Protocol Buffer is better with 0.09 ms against 0.3 ms.
Level 3 Unmarshalling stats
JSON 22 50498996 ns/op 5849893 B/op 148547 allocs/op
Proto 78 14708673 ns/op 4989886 B/op 105029 allocs/op
FBS 527564692 2.248 ns/op 0 B/op 0 allocs/op
Again FlatBuffer kept the pace, Protocol Buffer took around 4.9 MB and JSON 5.8 MB in memory allocation (per operation), and in terms of time spent Protocol Buffer took 14 ms while JSON 50 ms parsing the object, quite an important difference.
But why Flatbuffer is so performant in unmarshalling the object?, well to be fair you need to know that Flat Buffer does not even unmarshal the bytes, what it does in the marshaling process is to accommodate the bytes by offsets and vtables to be able to access to the data later on, thus, instead of re-building the whole object what it does is prepare the byte array and look for the field(s) based on the data type and/or length on demand (example here).
If you are very happy with the Flat Buffer results showed above just wait a minute, first you need to know what are the downsides of using it, these are some of them I have come across:
- Debugging binary messages is very hard, you cannot just log the payload or use middleboxes (firewall, Proxy, pub/sub, etc) to analyze the payload and take decisions based on that, this also applies to Protocol Buffers.
- The implementation is tedious, as you can see here, build the FlatBuffer object requires much more caution and steps than Protocol Buffer and JSON.
- There are some features missing in some languages, for instance, the binary search (Maps) is not yet available for Golang or Rust.
Before wrapping up let me share some final thoughts:
- The reason why it is important to take care of the memory allocation is because of memory management. In Golang the part taking care of this is the Garbage Collector (GC’s) and is in charge of freeing the memory no longer used by our programs, and to make a story short the GC’s is triggered (by default) when the heap memory is more than 4MB or if this wasn’t launched in the last 2 minutes, so if your GC’s needs to work very hard cleaning memory all the time this will consume the CPU you need for running your program.
- If after reading this you think using JSON is a bad idea, let me tell you the opposite, for regular scenarios, this is my first choice, it is widely supported, easy to debug, and as you saw the difference in normal cases is not excessive.
- According to the previous answer you might be wondering when it is a good chance then for using X-Buffers, well, I would personally use X-Buffers for Service to Service communication, Cache storage, when the business requires a very low latency like in games, or you’re working under very limited environments (network, disk, memory) like IoT.
Bonus:
- There is a fork of the Golang Protobuf project called GoGo Protobuf which promises to reduce the bytes allocated, let’s see some stats using the gogofaster approach:
marsahling
GogoProtoL1 1244660 944.4 ns/op 627 B/op 9 allocs/op
GogoProtoL2 17931 67348 ns/op 31112 B/op 433 allocs/op
GogoProtoL3 104 11383685 ns/op 4968503 B/op 62899 allocs/op
The implementation with GoGo Protobuf is truly improved compared to the standard Protocol Buffer implementation:
- Level 1: Byte allocation is 44% less, and the speed is 56% better
- Level 2: Byte allocation is 27% less, and the speed 47% better
- Level 3: Byte allocation is 22% less, and the speed 47% better
- The size is exactly the same in all scenarios
- The stats of Gogo Protobuf are the best in all scenarios including FlatBuffer and JSON.
unmarshalling
GogoProtoL1 1863813 628.5 ns/op 528 B/op 11 allocs/op
GogoProtoL2 34784 34081 ns/op 22206 B/op 465 allocs/op
GogoProtoL3 193 5784670 ns/op 3625775 B/op 64031 allocs/op
Level 1: Byte allocation is 31% less and the speed is 40% better
Level 2: Byte allocation is 29% less and the speed is 66% better
Level 3: Byte allocation is 25% less and the speed is 59% betters
2. You can also use Flat-Buffers with gRPC and leverage HTTP2 for better performance, check here for more information and here an example