Introduction to Apache Thrift
Recently I gave a presentation to our development team about Apache Thrift (https://thrift.apache.org/) — great software framework that allows applications written in different programming languages to communicate with each other. What differentiates it from other solutions is that Apache Thrift is relatively easy to implement and maintain.
I have written this article to preserve the presentation contents and supplement it with additional code examples.
Some history and background
Thrift was conceived in Facebook in 2007. Their company culture allows its developers to choose any programming language, which seems best for particular solution. This, obviously, led to multitude of applications written in different languages. That’s when a need for a tool that will allow the communication between those applications arose.
Facebook’s quest to search the perfect solution resulted in the conclusion, that there is not such thing. You can read about their findings, and also about basics of Thrift framework in the whitepaper (https://thrift.apache.org/static/files/thrift-20070401.pdf) they published.
Naturally, they took up the challenge and developed the solution by themselves — that’s how Thrift was born. Not much later they open-sourced the code and passed over the project to Apache Foundation, which is now responsible for the development. Now it is widely used not only by Facebook (where it is the main framework for inter-application communication), but many other companies (Evernote, Twitter and Netflix being some of the notable examples). Facebook engineers continue to work on their own fork, which exists now under the name of FBThrift (https://github.com/facebook/fbthrift) and hopefully will be incorporated to the Apache Thrift.
What exactly is the Apache Thrift?
So imagine the situation, where you have lots of applications written in different languages. In most popular scenario these are internal applications that perform various tasks and were written by separate development teams. How you enable those applications to talk to each other? Sure, you may add some REST APIs. But in many cases — especially when you transfer binary data — this solution doesn’t provide acceptable performance or maintainability.
How it works?
First, let’s have a look at Apache Thrift from developer’s point of view. Main concept of this framework is a service, which resembles classes that you know from object-oriented programming languages. Every service has methods, which are defined in a familiar way, using various data types implemented in Apache Thrift. The data types are mapped to their native counterparts in every language, so in case of simple ones, like int, they are mapped to integer in every language, but more complex, like set becomes, for example, array in PHP or HashSet in Java. The services are defined in so called Apache Thrift document, in which you use Interface Description Language (IDL) syntax (if you want to learn details about this syntax head to the official documentation: https://thrift.apache.org/docs/idl).
Then, from this file — using Apache Thrift compiler — you generate server and client stubs. These pieces of code are calling Apache Thrift library and you use them to implement server and clients — it’s like filling the blank spaces with the relevant code (i.e. creating objects, calling methods, etc.) to allow cross-communication between your applications. The code that you generate for both client and server is embedded in your application.
It is illustrated in the following image:
Figure 1. Source: “Learning Apache Thrift”, Krzysztof Rakowski, Packt Publishing, December 2015
Before we get to the example code, which will explain this concept, let’s have a quick look at the architecture of Apache Thrift. It is illustrated with the following simple image:
Figure 2. Source: “Learning Apache Thrift”, Krzysztof Rakowski, Packt Publishing, December 2015
Transport provides a way to read and write payload from and to the medium you use (most commonly — a network or a socket). Protocol is mostly independent of the transport used and is responsible for encoding and decoding the data, so it can be transmitted. Most popular protocols are: binary, compact (Thrift’s own) or JSON. Processor is generated automatically by the Apache Thrift compiler.
These three layers are combined in server and client codes. When you want two applications to communicate with each other, you need to use the same set of transport and protocol for encoding and decoding the information — you will see it in the examples in a while.
To start with Apache Thrift you need to install the compiler and libraries. To do so, you may use your Linux distribution’s package manager (i.e. sudo apt-get install thrift) or build from source (link: https://thrift.apache.org/docs/BuildingFromSource). This example is very basic, just to show you the general idea. If you are interested in more advanced examples, you are free to experiment yourself by reading official documentation or a book (which I recommend at the end of the article).
Our service will be very simple, using only basic types and stack elements. It will expose method add, accepting two integers and returning their sum. We will be calling PHP server over HTTP from Python client.
Note, that this is just simple example — without error handling, performance and reliability. It’s definitely not suitable for anything other than your own experimental work and learning.
Let’s construct our Apache Thrift document in IDL:
As you see, this basic document contains everything that is needed to create a service: some general information defining the environment, such as namespaces for programming languages and information about types used. Then, there is declaration of a service with a method. The method is extremely simple: it takes two integers and returns another integer.
This description of the service is everything that Apache Thrift needs to generate client and server stubs that we discussed earlier. To do that, run the following command:
Now you can examine gen-php and gen-py directories and admire the amount of work, that Apache Thrift did for you. The most interesting for us are files gen-py/adder/AddService.py and gen-php/adder/AddService.php, which contain interfaces that we will implement and client classes. Don’t be discouraged by their complexity — you won’t edit them directly nor need to fully understand, what’s going on. Using the benefits of object oriented programming, we will be only extending the objects provided by the generator :)
So, let’s start with filling in the blanks for the server code. As mentioned before, our server will be written in PHP. I chose this language for the example, as it is very easy to implement and run.
I prepared the server code for you, let’s have a look at it and then discuss it. Save the code below to some file in the main application directory, i.e. MyAddServer.php.
Looks overwhelming? Don’t worry — most of the code is just boilerplate, which is similar in every application. I will start from the inside, looking at the code which really matters. Here’s the implementation of the service — the very code that you need to write by yourself:
Even if you are beginner programmer, you will understand this code. You need to implement the interface that was prepared for you, and then define the method that we declared previously in our IDL document. In this case the implementation is simple, but in more complex applications you may want to call some other methods, use external objects, etc. It’s up to you. That’s the important part.
What you see above this code snippet are statements including relevant libraries that are required by your script. They are mostly the same for most of the implementations — the differences occur in names of the namespaces, different protocols or transports, etc. We will be here using only the simplest ones, so you should leave it as it is.
Below our implementation is the next part that is a boilerplate and doesn’t change much in the simple examples. The code here is responsible for handling the calls, preparing protocol and transports. Note, that in client we will have to use the same parameters for communication.
To run your code you may need HTTP server running. If you don’t have one installed on your machine, you can use this Python wrapper:
Save this wrapper in runserver.py file.
As we have our server ready, now it’s time to have our client code in Python. We will use this language for client because it is agile, compact and easy to run.
Take the following code and save in a file, for example MyAddClient.py:
You will notice, that the code is very similar in structure to the server code (although a lot simpler). We use the same transport and protocol.
At the top of the file we have some imports and transport creation. When the client is ready we can call the remote methods, as they were local:
Running the server and client
Running server is simple — as with any other PHP script. Just point your HTTP server to the file location, or run the wrapper you have prepared previously:
That’s it! Your server is running and ready to accept connections! (If you have any trouble — see if you are able to open port 8080 on your machine). Note if the port numbers in both client and server are similar — especially if you run PHP script through your own HTTP server.
Now for the client — it’s also very easy. Just run:
The output that you get should be similar to:
42 + 13 = 55
That means you successfully called remote add method. You can also observe the server console or log to see the incoming connections and log messages.
Summary of the example
As you see above, preparing and running server and client leveraging the power of Apache Thrift is relatively simple. Of course remember, that preparing production ready solution is much more demanding, in terms of architecture, performance, security and error handling.
When to use Apache Thrift
There are lots of different scenarios, when communication between applications is required. Apache Thrift is a good solution for some of them, but not all. When designing the architecture of your application you should take into consideration all circumstances to choose best tool for the job.
As a rule of thumb, you may consider Apache Thrift when you have a bunch of internal applications in different languages that need high-performance communication paired with flexibility and easy maintenance (especially when you need to transfer binary data). This is the case that was at the foundation of Thrift in Facebook. Now they use it as a primary tool for cross-application communication. (Further reading on this topic includes original whitepater (https://thrift.apache.org/static/files/thrift-20070401.pdf) and Facebook Code article (https://code.facebook.com/posts/1468950976659943/).
Some companies, such as Evernote, use Apache Thrift to communicate with applications installed outside of their environment, on users’ devices, and even expose Thrift API. You may read about that on their blog (https://blog.evernote.com/tech/2011/05/26/evernote-and-thrift/).
When exposing public API or working with preexisting tools you may want to take into consideration, that some concepts are more popular or accessible (for example RESTful APIs) than Apache Thrift and may be a better solution.
In this article you had a possibility to learn basics of Apache Thrift and run your own simple service. I hope that it gives you an overview of the framework and encourages you to consider it in your projects.
If you would like to expand your knowledge and learn about details of Apache Thrift, as well as work on some more advanced examples in wider variety of languages, I would like to refer you to my book “Learning Apache Thrift” released as paperback and ebook few months ago. It is available from publisher’s website (https://www.packtpub.com/application-development/learning-apache-thrift) or from Amazon (https://www.amazon.com/Learning-Apache-Thrift-Krzysztof-Rakowski/dp/1785882740).