Understanding Smile — A data format based on JSON
JSON is widely used for transporting data over network because of its lightweight and human readable format. I don’t remember my first encounter with JSON but it has crossed my path quite frequently ever since I started working in tech industry. XML and JSON are quite extensively used to send/receive data from server, with latter being pre-dominant in recent years.
Recently I heard this term Smile, a JSON based data format, to send data onto a Kafka pipeline. In addition to the fact that this data format is efficient over JSON, it lives up to its cheerful name quite literally also. First 2 bytes of its header are 0x3A (ASCII ‘:’) and 0x29 (ASCII ‘)’), making a Smiley :) and uniquely differentiating this format.
How is Smile efficient than a regular JSON?
Smile uses back referencing for properties (key/values) in addition to binary encoding making it highly efficient over JSON
Smile has two key features:
- Binary Encoding
- Back Referencing
What distinguishes Smile than rest of the efficient formats over JSON is back referencing. By default, Smile assigns a 1 or 2 bytes reference id to the keys when they are first encountered. Successive presence of the key is replaced by its reference id. This reduces the size drastically.
It can be reduced further by enabling the flag CHECK_SHARED_STRING_VALUES
while initializing SmileFactory. Enabling this flag is…