Adventures in Rails Fast JSON Serialization
In an ongoing series I like to call “Doing stuff as a Turing student” I’d like to discuss creating JSON output from an API. Wait, let me start from the beginning….
To begin let’s define some terms: an API is an Application Programming Interface. An API is essentially a database interface that operates over the Internet. Another way to look at it is a website for other computer programs. Computers don’t care about how things “look”. What color the screen is or that data is aligned in neat clickable boxes on the screen doesn’t matter at all. Computers just want the data they are looking for so they can do, well, whatever their user has told them to do. APIs exist all over the Internet allowing developers to access an enormous and ever-growing amount of data ready for manipulation. The term API is superbly Google-able — you can learn all you want and more from a simple search.
Now, I said that computers don’t care about how things look — that’s true technically, but there’s a corresponding problem. Data that’s accessible over the Internet is available for use, but how that data is formatted for the use of other computer programs is just as big a deal as how it looks is to us humans. Should matching data be presented as key: => :value, “key” — “value”, or any other numerous ways to represent it? Databases use IDs to uniquely identify records — should those be transmitted with the data or not, if so should they be transmitted first or last? All these are decisions that must be made by the API developer.
This leads to the “bikeshedding” problem. “Bikeshedding” is a developer term that means “the act of wasting time on trivial details while important matters are inadequately attended.” What format API data is presented in is not as important as the data itself. It is important, though. As of 2017 there are over 17,000 public APIs(1). Imagine if all of them presented data in a unique format! There would be thousands upon thousands of wasted person-hours — preventable by there being a solid standard for presenting API data.
Attempts to standardize APIs have been slow to roll out. In fact, before 2000 there were no real standards for how to either design or use an API at all(2). Since there there have been various efforts, including the implementation of JSON being the standard format for API-transmitted data. Only as resent as 2015 has there been a coherent attempt to standardize the JSON data APIs transmit(3). From my perspective as a new programmer the conclusion is: this is a very good time to start getting into API programming both on the “producer” and “consumer” side.
Before we get too far, let’s define another term: serialization. Serialization in Rails programming is a reference to formatting the JSON output of an API so that each record is unique and carries with it the desired data. There have been a few attempts at standardizing Rails JSON output, with corresponding Ruby gems to do the job. The most common were called “Active Model” serializers. But interestingly, most of them have fallen out of favor (and subsequently have seen their development dry up). The main reason was speed. The speed toll they took transforming the raw data into formatted JSON was too costly. So, in came Netflix…
The Netflix JSON serializer is simply called “Fast JSON API.” It presents data in accordance with the new JSON 1.0 standard. But it’s also, well, FAST. According to its own GitHub post “serialization time is at least 25 times faster than Active Model serializers on up to current benchmarks of 1000 records.”(4) We’re not sure if that’s entirely true (I personally suspect it to be somewhat Pyrrhic in a victory — their tests were probably setup to see them succeed brilliantly) but it’s probably faster than anything else out there. Turing has decided to start using it, probably because they can see the “writing on the wall” that it’s going to take over.
Using the serializer isn’t hard at all. It really involves just three things: installing/including the gem, creating a serializer, then calling that serializer in the appropriate controller action. You even get access to automatic generation of serializers via “rails generate” commands. It’s all pretty solid, actually! To get going just go to https://github.com/Netflix/fast_jsonapi and follow the very simple instructions.
A world of caution: the Netflix serializer is opinionated. It requires an :id to work, under the assumption that you are in the business of transmitting records and records have associated IDs. This can get in the way a bit if you end up trying to present raw data, for example simply a date or dollar amount. One of our tasks in fact was to sum a total amount of money made on a given day which would not have any particular ID associated with it — the data was aggregated from a series of transactions and the costs of numerous items. The “normal” route to get this data into the serializer would be to create a plain old Ruby object (PORO) and tack on the new data, requiring creating a new model class and associated tests. All that seemed like a lot of hassle every time you just want to spit out some data. In trying to avoid all that I discovered a cool trick: the use of Ruby OpenStruct (5).
In Ruby, an OpenStruct is essentially an object created on-the-fly with whatever attributes you want. Use “OpenStruct.new” and assign it attributes almost the same way you would a hash and viola — you have an object with those attributes. This comes in REALLY handy when using the Netflix serializer since it requires an ID. The strategy is this: if you want to present custom data that the relevant object doesn’t have, or data that doesn’t really associate with a database object to begin with, use an OpenStruct to create a temporary object to present to the Netflix serializer. You can use it to pass any data you want, and it will be automatically formatted to the JSON 1.0 standard.
In this case I’m using a new data point (revenue) so I needed to create a custom serializer, but in other instances you can just use a serializer you already have. Overall I’ve found use of the OpenStruct function really flexible and useful. I may learn other (perhaps better) ways of presenting object data to the serializer, but I’m glad to have this option available. And besides all that, it’s oddly fun to create a fake object! (I might have been coding for too long tonight.)
I’m looking forward to seeing what else I can do with APIs, so expect more posts soon….