Static binaries for a C++ application

14 min readMay 11, 2018

TL;DR; This describes how to generate a completely static binary for a complex C++ application which runs on all variants of Linux without any library dependency.

ArangoDB is a multi-model database written in C++. It is a sizable application with an executable size of 38MB (stripped) and quite some library dependencies. We provide binary packages for Linux, Windows and MacOS, and for Linux we cover all major distributions and their different versions, which makes our build and delivery pipeline extremely cluttered and awkward. At the beginning of this story, we needed approximately 12 hours just to build and publish a release, if everything goes well. This is the beginning of a tale to attack this problem.

Motivation and rationale

Motivated by what we see in the world of Go where one can easily produce completely static binaries with no external dependencies that run on any Linux variant out there, we were asking ourselves whether a similar feat could be pulled off for a C++ program as well. The benefits of such a universal Linux executable are manifold:

the same executable runs on any variant of Linux
there are no external dependencies on libraries
we only build the executable once and can wrap the same for multiple binary packages, which greatly speeds up the build and publication process
the build environment can be confined to a single Docker image
our customer support can much more easily reproduce issues because there are less variations in deployments
core files can be analyzed across Linux variants
we can provide smaller and more secure Docker images in the end
the whole process is more robust and breaks less often because we depend on fewer distributions making no breaking changes

There are a few disadvantages, which we should not fail to disclose:

security upgrades in any library we use are not applied automatically, but we have to wrap a new release
we might use fewer different compilers for testing and warnings
binaries are slightly larger
processes running our executable cannot share physical RAM for library code with other processes

If you do not care about a discussion of these arguments, you can just jump to the next section to read about the technical details.

I do not want to spend much time on discussing the benefits, because I find them quite compelling without further comment. So I just give the list of Linux distribution versions we currently support: Docker image, Centos6 (=RedHat6), Centos7 (=RedHat7), Fedora25, Debian8, Debian9, Ubuntu17.04 (=Ubuntu17.10), Ubuntu16.04, Ubuntu14.04, Ubuntu12.04, OpenSuse13.2. For all of them we build two binary packages (Community and Enterprise edition). This is 22 times! And imagine the benefits for our support team, who know from “Version 3.3.3” *exactly* which binary is running, regardless of Linux distribution. Note that we only supply packages for the x86_64 (or amd64) architecture.

Anyway, a few words about the above mentioned disadvantages are in order, and why I think these are insignificant for us in comparison to the benefits. Argument 1: Rollout of security updates in any library. Yes, if there is an important security update of a library we are using, then we have to act (for example libssl), build a new version and release it. However, we release patch level upgrades much more frequently (approximately twice per month) than the libraries we use. Furthermore, if someone upgrades a library, it is not guaranteed that they restart all processes using them! In particular a database server might remain running for a long time. So it is actually beneficial when we release an update and the automatic upgrade procedure catches it and restarts the database server with the new version. And releasing a new update is just getting much less painful and faster due to the static binaries. This covers argument 1.

Argument 2: Compiler testing. If all developers would always compile their test binaries statically and all with the same Docker image and C++ compiler, then argument 2. would actually be an argument. It is beneficial to compile and test on a wide range of different compilers and versions to spot problems early. But the developers can continue to do so. I think it is a good idea to use a consistent build environment for all CI tests and the released packages, and leave it to the developers to try different compiler versions.

Argument 3: Binaries are slightly larger. It actually turns out to be nearly irrelevant. Our main database server executable is some 38MB — when linked against shared libraries. The static one is larger by less than one MB, which is negligible. For the smaller executables the difference is more prominent, some smaller tools use 4MB with shared libraries and 6MB when statically linked. I think what helps here is that libmusl is generally smaller and that the rest of our code (due to RocksDB and V8) is much larger than the external libraries. You might argue now that we should link against shared libraries for RocksDB and V8, but this is near impossible due to the frequent changes in the API, at least for V8. It is much more robust to control the exact version we bundle. And anyway, who cares about a few MB in executable size these days, we are messing around with multi-hundred MB Docker images!

This brings me to Argument 4: processes running our executable cannot share physical RAM for library code with other processes, which is not very strong here, either. Since the shared libraries constitute only a small part of our executable size anyway, the RAM benefits from sharing pages with other processes are pretty slim. In particular for a database server process, which regularly uses hundreds of megabytes if not gigabytes of RAM, those few MBs of shared libraries do not actually play any significant role.

Obviously, depending on the type of program you want to deploy, your mileage may vary, but for us as a database manufacturer the case is pretty clear. A final argument might be that we will never make it into one of the prominent Linux distributions like Debian with this policy. Interestingly, for us as small and agile team, we release updates so often that any version in a stable distribution is outdated very quickly anyway.

`glibc` cannot be used, `libmusl` and Alpine Linux to the rescue

Interestingly, although one can build a completely static binary when linking against glibc, it is pretty pointless. The reason is that glibc loads certain modules dynamically at runtime in any case. This is to support the pluggable authentication modules (PAM) and the system wide switches for host lookup (nsswitch.conf), which is used when calling things like gethostbyname. Therefore, even if your executable is completely static, you still need the correct version of glibc installed on your system to run these executables successfully.

This makes it necessary to use an alternative C library like libmusl. However, this is not as easy as it seems. You need then not only all the C++ libraries (STL and friends) being built against that other C library, but also a complete tool chain with compiler, linker and
binutils. Fortunately, there is a Linux distribution which does all the heavy lifting for us, which is Alpine Linux. So the basic idea is, simply build your executable on Alpine and add -static to the final link step for the executables. What could possibly go wrong?

In fact, quite a lot…

Challenges when building on Alpine Linux with `libmusl`

One quickly creates a Docker image based on Alpine Linux, adds the first few obviously needed packages and hopes for the best. My initial package list was:

The following subsections describe the challenges and how I overcame them.

It does not compile

Compiling with a different C- and C++-library should be seamless, but is not. The prevalence of glibc on Linux has lead to a situation in which one sometimes uses glibc-specific extensions of the various standards without even noticing. Compiling with an alternative library brings these cases to light. In this section I will describe the concrete issues we found when compiling ArangoDB with libmusl on Alpine Linux.

The first issue was a case in which something which is a global variable in glibc turns out to be a macro in libmusl. Namely, we had code like this:

and then we were using facilitynames for a list of strings naming the syslog facilities. What is worse, we were using it in a way that broke when facilitynames was a macro in libmusl. This was easy to fix, but it already shows that trouble is ahead if one uses undocumented
features, and then even makes assumptions about something being a macro or not being a macro.

The second issue was that we had used the pthread attribute PTHREAD_MUTEX_ERRORCHECK_NP, which is – as its name suffix “_NP” suggests – not portable. And indeed it broke compilation under libmusl, since there it does not exist. This could also be fixed easily by just removing the call to the pthreads library, it was anyway a rarely used debugging feature which is not worth the loss in portability. Second lesson: Do not use non-portable code if at all possible.

The last compilation problem I found was the use of mallinfo, which is a glibc extension to obtain memory allocation information. As useful as mallinfo can be, it probably does not help if one uses an alternative allocation library like tcmalloc or jemalloc. Furthermore, it does not exist in libmusl and thus cannot be used in our static executable. The obvious idea here is to make its use dependent on the C library being used. Since we already cannot use it under Windows and MacOS, this is not a big deal. I fixed this by putting an #ifdef around its use checking whether we are on Linux and use glibc (macro __GLIBC__ is defined).

Interestingly, there is intentionally no macro to test for libmusl! The idea is that libmusl does not contain any specific extensions but only standardized calls, and therefore it should never be necessary to have libmusl-specific code. I have to admit that I indeed did not need such a feature.

This fixed all the compiler errors, there are still a few compiler warnings coming from the version of libboost we compile with, but these can safely be ignored.

It does not link

After these compiler errors were fixed, there were problems in the link stage.

A rather trivial one was that one had to specify -lssl a second time in the final link stage. This happens with static libraries, since the order in which such library arguments are specified to the linker matters. So this was easily fixed. This might not even have anything to do with Alpine, I did not investigate the details.

Slightly more difficult was that the version of libldap (openldap) bundled with Alpine (3.7) is linked against libressl (instead of libssl which is openssl), but we needed libssl itself for other reasons. So I had to remove the libressl and libldap package from the system and compile openldap myself, linked against libssl. I do this in the preparation of the build Docker image, so this does not increase the build time at all in the end. The lesson learned here is that different distributions sometimes use different libraries for the same purpose and thus linking can be a challenge. Well, in general, linking is a dark art, it seems.

The final problem at the link stage was our use of libraries to produce backtraces. I noticed that we used the glibc builtin stuff without using proper cmake recognition of the libraries needed. Since in libmusl, one needs a separate library (libexecinfo) for backtraces, the solution is simply to do what we should have done in the first place, to use a cmake setup to detect a backtrace library and add the necessary libraries in the link stage. This has the additional benefit that I could remove a hack for Windows that we had. Again, one should learn that one should use the proper tools to detect the necessary libraries.

It does not run

Finally, I had a completely static binary and thought that all is good and I am through. I could not have been wronger! I fired up the executable — and it immediately crashed. What?

Using the debugger, I found out that it never reached my main function. It actually already crashed during the relocation phase!

The investigation that followed took me nearly two days, but finally I got to the bottom of this first problem: We have some hand-crafted assembler code for the x86_64 architecture to compute CRC32 checksums quickly using Intel’s SSE4.2 extensions. It uses a runtime check when it first runs for the case that a very old processor does not support these extensions (using the CPUID instruction). The assembler code contained a few “absolute” addresses like jump tables and the like. The assembler translates this into an object file which contains addresses relative to its beginning together with relocation information. At runtime – even in a completely static executable – this relocation information is used to adjust the relative addresses to the actual absolute addresses which depend on the position to which the executable is actually loaded.

This worked beautifully under Ubuntu, but failed miserably under Alpine, with a crash during the runtime relocation. It turned out that this has nothing to do with static linking itself. The only difference is that the gcc compiler in Alpine by default creates executables with -pie(position independent executable). This is good for security, because it allows address space randomization, but bad, since something in the generation of the relocation table of assembler code that uses absolute addresses does not work. I found two workarounds: One is to simply switch off -pie by giving -nopie to the final link stage of the executables. The other is to change the assembler code to only use relative addressing modes. I went for the latter because it was easy and now allows -pie and address space randomization.

The second problem was that for whatever reason compiling against the jemalloc memory allocator did not work, I circumvented this by using the standard malloc/free implementation of libmusl. I might get back to this problem and see whether it can be fixed in a more satisfying way. But since we had several issues with jemalloc anyway, this might not even be a bad thing.

During my various experiments with static executables I also tried the clang compiler bundled with Alpine 3.7 (clang 5.0.0 at the time) but I never got it to produce working fully static executables, so I gave up on this.

The final problem at runtime was also very puzzling for a day or so. My executables finally ran, but whenever I opened the web based UI of ArangoDB, the database server crashed with a segfault, somewhere deep in V8-executed JavaScript code, which is notoriously difficult to debug. I should have noticed this a lot quicker, but this is how open-ended investigations go, one investigates for a long time, only to find in the end that one could have found out what is going on much quicker.

It turned out that the default stack sizes in libmusl are way to small for our purposes. glibc by default allocates 8MB for each stack for each thread. For libmusl, this has been lowered to 80kB for each thread. This works for many programs, but ArangoDB embeds Google’s V8 JavaScript engine and when we call JavaScript code from within C++ we quite often end up with pretty deep stacks. The workaround was to add some code like this to our main thread initialization routine basically as first thing in main:

This sets the default stack size for all further threads to 8MB, we do not execute JavaScript in the main thread, therefore it is OK to leave its stack size small. This is of course another use of non-portable code (pthread_setattr_default_np), this time deployed to make ArangoDB as application more portable, what an irony!

After this adjustment everything just worked fine and I had completely static executables.

Finally, I chose a sensible explicit setting for processor architecture specific optimizations. After all, the executable is supposed to run on as many systems as possible, so I do not want to use the latest and greatest Intel extensions. However, I do want SSE4.2 support, so I went for compiling with -march=nehalem, which seems to be a good compromise. Yes, there are older 64bit processors out there which do not support this, but we still have the runtime recognition for our special assembler code and it is unlikely that the compiler will ever use SSE4.2 specific optimizations elsewhere. If it does, I can always compile a separate, completely generic binary for these processors.

Implementation with a Docker image

To make building convenient, I created a Docker image using Alpine Linux. In the end, I ended up with this package list, adding some convenience stuff:

I then compiled the openldap library libldap myself using the installed openssl library and installed it in the Docker image.

Furthermore, I added some convenience build scripts to run cmake, make and make install and my “static binary factory” is up and running.

I mount a work directory into the Docker container which contains the source code as well as another directory to keep the ccache compiler cache data, such that subsequent builds can benefit from earlier ones.

That is, I can now build with the following Docker command:

This assumes you are running bash and contains one more trick: I hand my current user ID and group ID in environment variables into the container. The reason for this is that in the container, the compilation is executed as user root. Therefore, all the generated or touched files are owned by root in the end, which is rather inconvenient. Therefore, my compilation scripts do a chown -R $UID:$GID on the work directory in the end, such that after the termination of the Docker container all files belong to me.

Note that I learned to use and love the fish shell recently, so all my build scripts are written in fish. Similar to the rest of the content of this article, this is probably rather controversial. My sole argument is: I can finally remember how to write shell programs with fish, contrary to my long experience with writing bash and related scripts.

Finally here is the part of the compile script which sets up the ccache in the Docker container:

The set -x sets an environment variable. Then all that is needed is to tell cmake to use /usr/lib/ccache/bin/g++ as the C++ compiler.

All the code (and a lot more for our test runs and so on) are public in this repository. The stuff to build the Alpine Docker image is in the buildAlpine.docker directory and the actual build script is in scripts/buildAlpine.fish.

An unforeseen benefit: Windows Subsystem for Linux

After all this I got bold: I actually tried to run the static Linux executable on Windows with the Windows Subsystem for Linux. To my great surprise, this actually worked. There will be more information on this in a subsequent article.

Outlook

The next step is now to simplify our build and release process by attacking the binary packages. This is already ongoing and I will report on this in a later article. The plan is to create “universal” deb packages, which can be installed on *any* Debian based variant of Linux. Furthermore, I want to create a universal RPM package and a binary tar ball which can be executed wherever unpacked, as well as an Alpine Linux based relatively small Docker image.

There are challenges ahead, like for example the various startup systems like system-v init and systemd, but so far things look good. As a teaser, I actually managed to install my universal Debian package on the Ubuntu variant of the Windows Subsystem for Linux!

Stay tuned, this will be a lot of fun and probably provoke a lively discussion.