TrailDB: Java bindings

TrailDB is a library, implemented in C, which allows you to query series of events at blazing speed. The database is made of trails, which is a series of events ordered by timestamp. Each trail has a unique id and can represent for example a person and where each event can be an action performed by the person at a given time.

We wanted to try to use TrailDB for our project because we wanted to store a list of events for each user. The use case example is to track actions performed by a user visiting an online shop. The actions were for example “clicked link xyz”, “bought product abc”, etc. From that, one could for example extract common patterns.

Since TrailDB is written in C and the team is mainly coding in Java or Scala and bindings didn’t exist at the time, we wrote them!

Java Native Interface

There are two main ways to call native C code from Java code:

  • the Java Native Interface (JNI)
  • the Java Native Access (JNA)

Their difference resides in the fact that JNA was community developed and easier to use, but its drawback is that it is slower. Because native code calls are already slow and we wanted to still have the good performance of TrailDB, we choose to use JNI.

Having working bindings for the most basic function of TrailDB didn’t take much time, but then, it became more complicated since it was my first experience with JNI. Indeed, because we have to code in C/C++ and also use JNI-specific functions, it changes quite a bit from the usual Java programming. One thing we have to constantly think about is the performance because calls from JNI to Java and vice-versa are very expensive, so, if we don’t code carefully it might add significant overhead to the already unavoidable overhead due to the use of JNI.

General remarks

A good tutorial covering a fair amount of JNI stuff can be found here.

Although this tutorial is pretty complete, there are some pitfalls to be careful of:

  • Since JNI is present since JDK1.1, there is nothing special to do to start using it.
  • one thing I didn’t understand at the beginning is that the goal is to create our own library file. Then our Java code will load and use this library, which will then call the native C library.
  • compiling our library is not so trivial and the linkage/compilation have to be selected carefully.
  • the loading process of our library can be hard if one wants to do more customized load.
  • working with JNI for an unexperienced C programer might be hard at the beginning, because aspects like compilation and linkage are hidden when working with Java. Also C and JNI are pretty low level languages which require to code carefully to avoid segmentation faults and so on.

Steps to create a binding

Here is a Maven project architecture example:

Each class role is detailed below.

The following basic steps are required to create JNI bindings for a C library:

Step 1

Create a method with the “native” keyword and consisting of only its signature (not body) in TrailDBNative Java class. Here is an example:

Step 2

Generate the .h file of the Java class containing the native method. For the architecture shown previously, you can run the following command from the src/main/java location:

It will generate a file named io_sqooba_traildb_TrailDBNative.h, in src/main/java, and containing a method header looking like this:

You can move this .h file in src/main/native to have better separation from the Java classes.

Step 3

Create a a C/C++ file with the same name as the .h but with .c or .cpp. Place it where the header file is, for example in src/main/native. Copy paste the method signature from the .h implement it:

Don’t forget to include the following headers (along with your other custom dependencies) to the .c/.cpp:

What I suggest is using C++ and creating a .cpp file because JNI is a bit less painful in C++ than in C. But if you do so, you have to put the following to include the C code for which you create bindings:

Step 4

Compile the C/C++ code. Compiling is a tricky part because you have to include all files that your library and the one you are calling need. The command I use to compile on Linux and Mac can be found in the pom.xml of the traildb-java project.
Here is an example of the linkage options for Linux:

The linkage options are the most important since they tell the compiler where to find the files it needs.

And the compiler options:

Or, in command line:

On Linux your library has to have the prefix “lib”, for example “libtraildbjava.so”.

This command has to be launched from where the cpp file is. It will generate the library file at this place also. For example src/main/native.

More detailed explanation of the compiling process as well as dynamic vs static shared library can be found on these notes.

I found it useful to learn more about linkage and compilation options in order to not waste time trying options until it worked because it looks like black magic at first.

Also keep in mind that the library has to be compiled on the platform you intend to use it on. The file for Linux (.so) is totally different from the one for Mac (.dylib) or even Windows (.dll).

Step 5

To load our library in Java, we need a static{} code block so it is done once when the program launches. There are two methods to load a library:
System.loadLibrary(“mylib”).
System.load(“path/to/mylib”).
I used the second one because it works with an absolute path, which allows more customisation. Below is a simple example to perform the load:

The name you provide in the loadLibrary method must not have the “lib” prefix nor an extension.

One problem I encountered in this step was that I tried to load my compiled library from a jar package (where all my project was), but this is not allowed so I had to first copy the lib out of the jar and then load it.

Step 6

Create a main Java class that calls the native method from Step 1. and launch it. Here are the complete classes:

Note: these example classes do not use a external C library (do not make bindings).

You then can launch either from the command line or using the IDE and will certainly get the following error:

You then have to add your library to the java.library.path:

JNI optimisations

It is very important to understand how JNI works, what it does when you make native calls and how often you do it. This will be critical for the performance of your program. Remember that JNI adds a significant overhead to the performance, so the less you have to cross the boundaries between JNI and Java, the better.

To optimise my code I used the following excellent article which describes the several pitfalls of JNI and how to avoid them.

CI/CD with TravisCI

After managing to successfully compile and use the project on different platforms, one question quickly arose: how to easily share the project without having to clone it from a repo (which was private at this time) and then compile it ? The answer to this is to make the project available as a Maven dependency, which is very convenient.

At first, I just made a single fat-jar which also contained the compiled libraries for Linux and Mac. Then, one could add the jar to a project build path and directly use it. However this was still not as good as a Maven dependency, this is why we headed towards a CI/CD pipeline.

Not only does a CI/CD pipeline provide a way to deliver and deploy our project, but the continuous integration part makes sure it always compiles, builds and passes the tests. Setting up a functional pipeline on TravisCI was a hard task and took a long time because in the end we wanted to have a single jar containing both the Linux and Mac libraries to finally publish it to a Maven repository.

First, I began the pipeline by launching a build separately on both OS’s. This was pretty easy because TravisCI provides a way to specify multiple OS’s for build as explained here. But then the tricky part was to combine the two results, if successful, and deploy a single jar out of it. Indeed,

The first version of the pipeline looked like this:

  • Run a mac build and a Linux build in parallel
  • Deploy a single fat jar

However, since between each build the virtual machines are wiped, we can not easily share information from one job to another. So we couldn’t get the built jars in the deploy job (that ran in another VM).

To achieve this information sharing, we decided to upload the generated library from the Mac build on an external sftp server, so it could then be retrieved later on another job. From that point, we decided to use a beta feature of Travis called Build Stages because there was an example doing almost what we wanted. Also the Build Stages feature gives us the possibility to run some jobs on parallel and make jobs be conditionally launched on the result of other jobs.

Below is a small diagram showing the Travis pipeline:

And here is the Travis file:

In order to not upload/download the mac library and then deploy the jar artefact each time we trigger a build for a branch update, we also added an option which tells Travis to perform those actions only when there is a tagged git commit corresponding to the new release version.

To sum up the steps we have:

  1. Build on Mac and upload on sftp server
  2. Build on Linux
  3. Create the final jar by building again on Linux, incorporating the downloaded Mac library and deploy.

Notes: steps 1. and 2. are done in parallel. The mac upload and the step 3 are done only on releases.

Deployment consists in publishing the jar to a publicly accessible repository which supports Maven jars. As repository we chose to use packagecloud.io because it is really easy to set up and use. The final goal would be to upload the jar to the Maven Central Repository but it requires more work and validation of the jar to match Maven standards.

While setting up this TravisCI pipeline, we encountered a problem were the Mac library was corrupted. We first thought it was due to bad compilation settings of the Travis virtual machine so we upgraded its xcode version but it didn’t solve the problem. After a lot of searches, we discovered that it was a Maven plugin issue. Indeed, we used the maven-resource-plugin to put the Mac library in the final jar artefact but the plugin was also changing the file encoding, which made it unreadable.

Conclusion

The project involving creating Java bindings for TrailDB (traildb-java) was a very rewarding experience to me as I learnt to use the JNI framework which I had no clue about. JNI is an important tool if we want to make calls to native code and I learnt the importance of how it was crucial to know what we are doing when coding with it to avoid bad performance or segmentation faults. Because it relies on C/C++, it also refreshed my knowledge of lower level programming languages than Java.

Another great point of doing this project was the fact that is was used as part of a bigger project, which allowed me to work and discuss actively with the team working on it. This gave me a good experience on how to work with a team on a daily basis.

Show your support

Clapping shows how much you appreciated Baptiste Sottas’s story.