Understanding Smile — A data format based on JSON

Ayush Gupta
Code with Ayush
Published in
3 min readFeb 10, 2019

--

JSON is widely used for transporting data over network because of its lightweight and human readable format. I don’t remember my first encounter with JSON but it has crossed my path quite frequently ever since I started working in tech industry. XML and JSON are quite extensively used to send/receive data from server, with latter being pre-dominant in recent years.
Recently I heard this term Smile, a JSON based data format, to send data onto a Kafka pipeline. In addition to the fact that this data format is efficient over JSON, it lives up to its cheerful name quite literally also. First 2 bytes of its header are 0x3A (ASCII ‘:’) and 0x29 (ASCII ‘)’), making a Smiley :) and uniquely differentiating this format.

How is Smile efficient than a regular JSON?

Smile uses back referencing for properties (key/values) in addition to binary encoding making it highly efficient over JSON

Smile has two key features:

  • Binary Encoding
  • Back Referencing

What distinguishes Smile than rest of the efficient formats over JSON is back referencing. By default, Smile assigns a 1 or 2 bytes reference id to the keys when they are first encountered. Successive presence of the key is replaced by its reference id. This reduces the size drastically.
It can be reduced further by enabling the flag CHECK_SHARED_STRING_VALUES while initializing SmileFactory. Enabling this flag is…

--

--