json vs msgpack

Hao Gao
Hadoop Noob
Published in
1 min readSep 14, 2018

Which is better? It is really hard to say if we don’t give some context or constraints. Because if I could build it from scratch, I may choose neither.

So I have a cluster of Fluentd aggregators which streams data to treasure data. I need to fork the stream to kinesis. Through kinesis, I can use flink to process the data. Fluentd has built-in json and msgpack formatter. I need to make a choice on which format to use. Although there are some protobuf or avro plugins from the community, we still want to keep the data schema-less. We also try to combine with zlib since it is also built-in

After read some blogs on the internet, I decided to benchmark it myself. In my benchmark, I don’t care much about serialization since we have lots of capacity on our aggregator clusters. I care more about the deserialization speed on consumers (e.g. flink).

Json data, I will use Gson and dsl-json to decode. dsl-json, we use reflection since schemaless. Msgpack data, I will use unpacker to decode.

#################### Deserialization Benchmarks ####################
18667 json records read. 18667 records will be processed
18667 msgpack records read. 18667 records will be processed
########################################
dslJson test
Deserialization takes 469.865097 ms
########################################
Gson test
Deserialization takes 709.607553 ms
########################################
msgpack test with unpacker
Deserialization takes 264.010102 ms
########################################

So looks like msgpack is the winner :)

--

--