The Story of Building Compile-Time JSON Serializer

Hussachai Puripunpinyo
Zappos Engineering
Published in
8 min readDec 20, 2015

--

At Zappos, we love trying new things. Fear of failure is not in our veins. I feel like I have to show the quote that you might have seen a thousand times in your Facebook feed. Here you go! Add one more time to your record :)

I have not failed. I’ve just found 10,000 ways that won’t work.
Thomas A. Edison

Why did I mention the failure in the first place? Am I talking about my fail attempt? It depends on the perspective. I don’t think this project will gain either much attention or many users. The JSON parser battlefield is very tough. We already have a lot of JSON parsers in the market and the strongest ones among those are Jackson and Gson. Do you think that my hobby project will stand a chance in this battle? Of course not but it’s really fun to challenge the giants.

I was working on Cache system that we use internally when I noticed something bad in my code that I had to copy code over and over all over the place. We use Spring IoC and that framework helps us eliminate a lot of duplicate code by providing an elegant way to compose an object.

@Autowired
private CacheManager cacheManager;
public User getUser(Integer id){
Cache cache = getCache(“userCache”);
Element elem = cache.get(id);
if(elem != null) return elem.getObjectValue();

User user = getThisFromDb(id);
cache.put(new Element(id, user));
return user;
}

All bold lines will be copied over and over. I can create an object that wraps CacheManager and put all common codes in a helper method. It will be even shorter if I can use Java 8 lambda expression that allows me to pass anonymous function around without defining a new interface. Unfortunately, I couldn’t use Java 8 and there were still a lot of code that I think should not be there. I didn’t want to pollute the business logic code with code that is not related to it. That’s when AspectJ came into play! With AspectJ, I can create a PointCut that modifies all target JoinPoints with my advice code. Sorry for some AOP jargon. Basically, I can modify the behavior of existing method without modifying its code by providing an advice and let AspectJ do the work at compile-time. It’s fascinating, isn’t it? I can introduce the cache functionality to an existing code without modifying it.

After I had done that, I started working on the small library that is a client for some web services. I wanted to add cache functionality to my library and I found that AspectJ made my library no longer simple and small. In addition, I wanted the client of my library to be able to decide whether to use a cache functionality or not. So, I needed a plug-able solution and that led me to explore how AspectJ works. I didn’t dig into AspectJ code because I think that would have been somewhat overkill. I just looked from the bird’s-eye view. As I suspected, I would need byte code manipulation. Before that, I looked at Java Proxy. Proxy does not change byte code but it can intercept the method call. Unfortunately, its functionality is quite limited and it doesn’t work with a concrete class. So, I looked at cglib. After looking into cglib, I think I could use it as a pluggable cache strategy by giving an option to users whether they want to use cache. If they do, cglib can modify the target class and add the cache code to the byte code at run-time.

cglib — Byte Code Generation Library is high level API to generate and transform Java byte code. It is used by AOP, testing, data access frameworks to generate dynamic proxy objects and intercept field access

cglib is very easy to use because it’s quite simple and the APIs are pretty high level. You don’t have to know how byte code works, you just have to learn how to use API. It’s easy, I was able to make it work in a couple hours without knowing cglib before. As I was learning how cglib works, I found that many developers avoid cglib because it seems like many people doubt the future of this library. So, I looked for a library that is backed by a big organization to make sure that the library will have less chance to get abandoned. Though really, the reason why I decided not to rely on cglib was that I just wanted to explore more :). I found javassist, another byte code manipulation library provided by JBoss. Javassist takes different approach from cglib. It looks a bit more difficult than cglib at first but it’s more powerful and flexible than cglib. When I was getting to know javassist more, I realized that the possibility of this toolkit is endless. You can write Java code and compile during run-time. So after exploring some of the libraries, I wanted to add a plug-able cache component where users can choose which byte code manipulation provider they want to use, I started thinking that I want to play more with javassist. I thought that I could do something awesome with it. It had a high chance to fail but it’s fun and I wanted to take a risk. (It’s not actually a risk though, I might end up creating a project that no one uses but I will gain useful knowledge a long the way. I worked on it mostly on my spare time.)

I had experience coding in Scala prior to joining Zappos. So, I knew some Scala and one of its features that looks like magic to me is macro. It’s similar to AOP in certain aspects (no pun intended, maybe a little). In Scala, there is a project named scala/pickling that is a serialization framework that generates serialization-related code at compile-time. That’s interesting. In general, a lot of JSON libraries use reflection APIs to introspect the class definition at run-time. It’s relatively slower than hard coding. However, Scala pickling uses a macro mechanism to rewrite a serialization process at compile-time to boost the performance by pre-processing the code that usually runs at run-time. Java has a byte code manipulation toolkit that is capable of doing the same thing as Scala does. (I might be wrong because I’m a novice in both areas :) , please let me know my mistake by commenting below.)

We use JSON in almost all of our projects. When I was thinking about that , something came up into my mind. Why shouldn’t I try creating a JSON serialization library that use byte code manipulation? It’s going to be fun and the library should perform faster than other libraries that use reflection. That was my assumption.

I spent a week, some parts of my day and the whole weekend, learning javassist and JSON spec. I did some research by looking at other projects’ source code and took some pieces of code from those projects to speed up development time such as JSON parser and JSON escape. I dug deeper and I hit the limitation of reflection API caused by a type eraser. I learned how auto-boxing works behind the scene and I had to use some APIs that I hadn’t used before. The first version is surely not complete. It has some limitations but it’s working pretty good.

That was definitely fun experience and I was grateful that my team felt interested the same way as I did. Work environment here is very supportive and they encourage you to do something different. Since Amazon acquired Zappos in 2009, the process of releasing open source software must go through Amazon process. I took this chance to test this process out. I told my team-mate, Darshan Bhatt, that I wanted to release this project as an open source. He suggested that I talk to Brian Kirkby and I also did a brief chat with Bob Stockdale. These people are great drivers of innovation at Zappos. Finally, this project was approved by Henri Yandell from Amazon, he also gave me a lot useful advice. I don’t know how well this project is going to be accepted by developer community at large, but one thing I know for sure is that this was fun! Also, the project itself demonstrates Zappos’ principles in a fun way to show the world that we have a lot of fun things to do. If you can’t find one, you can create one!

Zappos JSON

In summary, this is the name of JSON serializer that I’m working on. We decided to use generic name to avoid trademark issue. We used this library internally for a tiny internal project and thinking about sharing. The original version was designed in less than a week and I made it while I was learning javassist. So, its design and how it was implemented are definitely not the best.

Enough talk about its story. Let see how it works!

The Idea
Suppose that you have Java bean like the following

public class Foo {
private String name;
private int age;
public String getName(){
return name;
}
public void setName(String name){
this.name = name;
}
public int getAge(){
return age;
}
public void setAge(int age){
this.age = age;
}
}

The serializer will analyze the class file then create a JSON serializer class like the following:

public class FooSerializer {
public String serialize(Foo object){
StringBuilder str = new StringBuilder();
str.append("{\"name\":\"").append(object.getName())
.append("\", \"age\": ").append(object.getAge())
.append("}");
return str.toString();
}

That’s basic idea. The library creates a hard code for serializing object to JSON. The actual implementation is much more complicated than the above example because the library has to deal with null values, arrays, primitive types, collections, maps, enums, custom formatters, nested beans, and annotations. Same thing with JSON parser that the library analyzes object graph to create the class file for deserializing JSON to object and cache that class definition for later use. The class name is generated randomly to avoid name conflict when you have class reloading agent installed such as JRebel. However, this might cause some issues such as “java.lang.OutOfMemoryError: PermGen space” when the class reloading agent reloads the class too many times and your JVM’s perm gen space is too small. This issue may happen during development time but it should not happen in production except in certain circumstances.

Features

The main feature of Zappos JSON is a bean binding. It does not have JSON typed object because we try to eliminate as many intermediate objects as possible. Zappos JSON is supposed to be a drop-in replacement for Google Gson when you use Gson for bean binding only and you don’t use a complex feature provided by Gson. The only dependency of Zappos JSON is JBoss javassist. (We may package this small library into the distribution in the future)

  • Binding bean is an easy job. (Can do it in one line)
  • Primitive types and its wrapper
  • Array and Collection
  • Nested bean.
  • Custom type such as Date, Enum, and other scalar-value types.
  • Map of scalar value and map of object (This feature has some limitation)
  • Annotation support
  • You can mixed all of above together.

I will create some benchmarks soon. Hopefully it can beat Gson and ideally it can beat Jackson too but my guess is likely to be wrong because those libraries have been optimized for many years by genius people while this one was an experiment project from an average developer :)

--

--

Hussachai Puripunpinyo
Zappos Engineering

Daytime stay-at-home dad, nighttime entrepreneur. A man with dreams far greater than himself. My work: https://tailrec.io