Sharing Models across Internal API’s with Apache Thrift

This post is inspired from a recent engineering conference here at Rightster in which my talk focused on the usage of Apache Thrift and other tools out there designed for encoding, decoding and transmitting structured data between internal API’s. The slides are available on surge here: http://rightster-thrift.surge.sh/#/0 and on GitHub: https://github.com/thepauleh/slides/tree/master/presentation/thrift — worth a note that they’re written in React using the ‘Spectacle’ library (https://github.com/FormidableLabs/spectacle), which was a really cool exposure to hot module reloading.

In the Beginning…

One of the first things I learned from watching many of the videos on this subject was that this isn’t at all a fresh concept — in fact back in 1984 there was an ISO standard created called ‘Abstract Syntax Notation 1’. Its purpose was “to define a set of rules and structures for encoding, transmitting and decoding messages”. While most of us are blissfully ignorant to it’s existence it’s reported to be used widely for communications — notably UMTS and VoIP.

The above example shows a very basic structure that can define a PDU in ASN1. As a developer you can probably see at this stage how you might structure a basic object using this syntax. The next step from here would be to implement this structure in a language of your choice. From the videos I watched on this subject it seems that this is not ideal as most implementations are proprietary — although I did find one for PHP here: http://www.phpkode.com/source/s/mistpark-server/library/asn1.php (at your own risk — not used it myself).

When the PDU (Protocol Data Unit) is sent over the wire, it can be done with a choice of encoding rules — the list of these ‘is quite large’, but you get the gist that can essentially send something as XML, Binary or another String Format, and when you decode it, you’ll have a message that could be hydrated to a model you’ve created using the same ASN1 structure in potentially a different language.

The three examples in the screenshot above show just how wasteful common encoding rules such as XML or JSON can be. XML particularly so as the schema data is twice as large as JSON, but even using DER you can see an optimisation that can be made by using packed encoding rules — this removes the schema data entirely. This is more likely to be used on internal API’s as debugging as a third party would be incredibly hard, but the concept here is that both parties already know what the data they’re sending/receiving should look like — so they have no purpose for the schema data.

The New Options

Apache Thrift and Google Protobuffers aren’t the only modern alternatives to ASN1 — the options available for your implementation vary greatly by the languages you need to support, and the type of application you’re building.

- Apache Thrift (Most suitable if support for many languages is required)

Originally published by Facebook in 2007 — I’ve taken a particular shine to it due to the massive number of implementations that it has. I have absolutely no doubt that it is not the most optimal of the bunch, but the fact of the matter is that we don’t intend of building our API’s in C++ (any time soon…)

- Google Protobuffers

Open sourced a few years ago, Google Protobuffers has been one of the most floated solutions in this area. Unfortunately it doesn’t have so many implementations, and once you step outside of the languages supported by the Google Team, the bugs are very intimidating. Take for example the floating point and unsigned integer issues reported in the PHP implementation: https://github.com/drslump/Protobuf-PHP

- Cap’n Proto

A project created by one of the Technical leads on the Google Protobuffers project. Claims to be faster than the competition, but doesn’t provide any benchmarks and has very limited implementations).

- FlatBuffers

Another Google solution, designed for gaming — implementations for this are quite short — only supporting C#, Go, Java and C++.

- SBE

A solution for the Financial Markets, assuming that means this is ‘incredibly fast’. Unfortunately very limited implementations also meant this was a ‘no go’ for our API’s.

Getting Started with Apache Thrift

It appears at present inside of Facebook that Engineering still see the Thrift project as a good thing, although the recent talk by Nick Shrock on GraphQL indicated that Thrift is used on the API’s beneath their GraphQL layer — I suppose this is due to the data model returned to their users being much more fluid, and typically a combination of multiple API’s that create more complex models than used on their internal API’s.

There’s probably a big gap between learning the concepts of these solutions and actually making something usable — here’s an example .thrift file that can be used to generate a user model:

filename: user.thrift

namespace php App.Models.Thrift
struct User {
1: i32 id
2: string name
3: string email
4: string password
5: bool remember_token
6: string created_at
7: string updated_at
}

You’ll then need to have thrift installed on your machine — for OSX users this is as simple as “brew install thrift”, but there’s examples of this on the thrift website https://thrift.apache.org/docs/install/.

Now if you go ahead and run the following two commands:

thrift -r — gen java user.thift

thrift -r — gen php user.thift

The following files will be created:

Now when you investigate those files (User.java & Types.php) you should see the User class defined in each language. Now, the PHP file isn’t going to work with PSR autoloading, but a rename (and splitting the file into separate classes if there’s more than one struct defined) isn’t too much of an ordeal. My main concern at this point is the complexity of the code that has been generated to do something very similar, and the best way to add more functionality to these objects from here.

It’d be great to here what approaches people are using with Apache Thrift. Composition or Inheritance are the first ideas that come to mind, as when a new .thrift file is published you really don’t want to apply the changes you made to the newly generated models.

I’m currently investigating a best approach for hydrating these objects from the database, and the suitability of Thrift on projects that are already using an ORM. The most limiting part I found of this technology seems to be how small the uptake of this has been — there’s a great ecosystem inside of the Apache Thrift world waiting to happen, but this is now an 8 year old solution that isn’t getting the hype that it requires to bring first-class implementations to every major framework.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.