Faster and smaller serialized data in Android

Published in

Jumbotail

4 min readJul 3, 2016

Source https://medium.com/@duhroach/how-png-works-f1174e3cc7b7

Serialization is a process of converting an in-memory object into a formatted chunk of data which can be converted back into the in-memory object.

Easiest solution is to implement Serializable interface and it pretty much solves our problem.

class Person implements Serializable {
   String name;
   int age;
   String phone;
}

But this solution is not memory efficient (as it creates transient memory allocations) and is slow for Android. We can use Gson library to make this memory efficient and faster.

class Person {
   @SerializedName('n')
   String name;
   
   @SerializedName('a')
   int age;   @SerializedName('p')
   String phone;
}

But since Gson uses JSON format, it produces bloated file. Formats like JSON and XML tend to decode slower and produce verbose result. Of-course these formats are good as they are human readable and changes can be easily made into them, but they cost extra data to be sent to the user. Good news is that, we can use binary serializers like FlatBuffers to get rid of these problems.

What are FlatBuffers?

FlatBuffers (relatively similar to Protocol Buffers) is an efficient cross platform serialization library for C++, C#, C, Go, Java, JavaScript, PHP, and Python. It was originally created at Google for game development and other performance-critical applications.

Why use FlatBuffers?

Access to serialized data without parsing/unpacking — It represents hierarchical data in a flat binary buffer in such a way that it can still be accessed directly without parsing/unpacking, while also still supporting data structure evolution (forwards/backwards compatibility).
Memory efficiency and speed — The only memory needed to access the data is that of the buffer. It requires zero additional allocations.
Flexible — Provides flexibility of modifying fields thus we have a lot of choice of what data we write, what we don’t and how we design our data structures.
Tiny code footprint — Small amount of generated code.
Strongly typed — Errors happen at compile time thus no manual overhead of checking errors at run time.
Supports object reuse in Java — No Garbage Collection events.
Cross platform

FlatBuffers’ performance over other formats

It produces smaller encoded file size over others. Thus saving user’s bandwidth while transferring the data over internet and also making calls faster.
Encoding and decoding time is almost negligible compared to others. It takes about 35ms in parsing a 20kb JSON file, which exceeds the UI frame refresh interval of 16.6 ms. But using FlateBuffers, we won’t be missing any UI frame.
Parser initialization — If you are using model based JSON parser then it takes 100–200 ms for parser initialization.
Garbage Collection — A lot of small objects are created during JSON parsing. It typically creates 100 kb of transient memory while parsing a stream of 20 kb. Thus resulting in GC events and slowing down the app.

Checkout this benchmark for a detailed comparison.

We can go beyond this to make serialized data even smaller.

Array-of-Struct Vs Struct-of-Array format

We store data as Objects and thus serializing an array of those objects preserves their structures in the data stream. This is called Array-Of-Structures.

Considering above Person class, this is how an array of Person’s object will look like in serialized stream:

"persons" : [
       "name":"Rahul Kumar", "age":21, "phone":"+91 9876543210",
       "name":"Shivani Singh", "age":21, "phone":"+91 1231231231",
       "name":"Rishi Dua", "age":26, "phone":"+91 1234567890",
       ....
      ]

You can notice here that for every object, value is against its full property name. Thus creating bloated stream. Now you may say that, we can apply GZIP compression on top of that to make a smaller stream. But GZIP compresses file by finding duplicate data as long as they are in a window of 32k characters from each other. Thus larger serialized classes result in large distance between similar data types, which will result in less duplication in the 32k characters window.

To curb this problem, what we can do is to use Structures-of-Array form. Which means, given an array of objects, take each property of the class and make an array of the property values. Above persons’ array would look like this:

"persons" : {
       "name":["Rahul Kumar", "Shivani Singh", "Rishi Dua"],
       "age":[21, 21, 26],
       "phone":["+91 9876543210", "+91 1231231231", "+91 1234567890"]
      }

This is not object oriented, but we have effectively removed bloating from the serialized stream. Along with that, GZIP compression algorithm has a better chance of finding duplicates in 32k characters window as similar data types are close.

This also reduces raw object count (and thus memory overhead) thereby increasing data locality and makes better use of precious memory bandwidth and CPU cache space. (I will discuss memory performance in Android in some other blog.)

Conclusion

These are few techniques for making serialized stream very small. Once you have transposed your data in Struct-of-Array form, you can achieve better compression and faster serialization by adopting FlatBuffers. Thus applying different layers of awesomeness on top of each other can make the smoothest app possible.

If you enjoyed this, please ❤ below. Thanks!