How we Applied umake to Accelerate our Build Time

DriveNets
The DriveNets TechBlog
11 min readMar 25, 2020
How we Applied umake to Accelerate our Build Time

In my previous blog post I wrote about why we developed umake and how we are using it. In this post, I want to go into how umake improves the build process. Since umake does a lot of heavy lifting behind the scenes, I will break this process down into different sections, going in-depth into each topic.

Improving the Caching System

When looking into how to improve the performance of any process, you should always start with the following principle: the fastest way to achieve something is to not do anything at all. Applying this approach to the world of build systems means that if you can avoid building an artifact, then you should.

One amazing compiler tool is GCC, which is heavily optimized. Nevertheless, on large projects the build can take a long time to compile. This gets much worse with g++ and with all those header only libraries, ever so common in recent years.

So we wanted to avoid calling the compiler, or the build command, as much as we could. We wanted to use something that would work faster, mainly since we wanted it to do less. From a quick Google search, you may turn up ccache. The compiler cache, ccache, does just that. It caches the invocations of GCC, and saves the results. It checks how GCC was activated, i.e which flags were used, which files were accessed and more. The ccache compiler took all these metrics and used them as the key in its database. The value of that entry would be the resulting file. In our case, the object file was created by GCC.

So ccache was a great tool. It had been around for awhile. According to the official site, version 2.4 was released as far back as 2004. This concept was not new, and had a very good record of production usage. For the single developer, ccache worked great — i.e. when working on a single machine. However, when looking at a regular software company, we had a lot more than a single developer. Based on this, we wanted to improve our caching even more.

Taking ccache One Step Further: sccache

For our challenge, it was time for sccache to shine. Developed by Mozzila, sccache is a tool that took ccache one step further, storing the cached files on a remote server. With sccache, if a single machine in the office was used to build a file, then all the other machines get it — for free. This was a big step forward. One of the current drawbacks of sccache is the fact that it would always go to the remote cache. Users choose whether they want the remote or the local cache. Only one approach is used at a time, not both. I am hopeful that at some point in the future this will be fixed in sccache.

Completing the build as fast as possible with umake

While ccache and sccache are great, we wanted to improve the process even more. With ccache only seeing the build of a single file, it had no knowledge of the complete project. This made integrating ccache into an existing build very simple. On the other hand, this extra metadata, which could be used by ccache, was ignored. This is where umake came in.

Similar to ccache, umake could compute hash over the build command and all its input files. It first checks if it had a local copy of the wanted data. If that exists, then it uses that. When there is no local copy, the remote cache is checked. If the file is not available even there, then a local build is executed. Knowing all the files that are relevant for the project, umake uses this metadata to skip even more steps in the build. For example, when downloading all files from the remote cache, we already had the relevant hashes when we reached the final stage, e.g building the actual app. By using this information, we could perform a lot less hash computations. Using this metadata in many other ways, umake completed the build as fast as possible. (For more details on umake caching refer to the docs.)

Automatic dependencies detection

One of the key features of every build system is the ability to handle dependencies. Let’s look at this by assuming you have the following c source file called my_file.c

#include <stdio.h>#include “my_file.h”Int main(int argc, char **argv) {printf(“foo %d\n”, FOO);
return 0;
}

Lets also define the header file my_file.h

#ifndef _MY_FILE_H_#define _MY_FILE_H_#define FOO 10#endif

This example file includes a header file, my_file.h, which means that if my_file.h is modified, then you need to build my_file.c again. Now comes the question — how do you let your build system know that this dependency exists? Let’s look into this further by using simple makefiles and a common approach:

my_file.o: my_file.c my_file.hgcc -c -o my_file.o my_file.cmy_app: my_file.ogcc -o my_app my_file.o

The make build tool used a very simple approach — modification time. This just checks if the explicit dependency is set, in this case my_file.h, as newer than the target file, in this case my_file.o. If it is newer, then it builds the target again. While this worked very well for small projects, it still had a lot of problems. For example, if we would edit my_file.c and add another #include directive, then we needed to update the makefile. This was very hard to maintain on large code bases. It usually ended up being broken and invoking the notorious make clean && make pattern. Since we didn’t know what to build, we ended up building everything!

Support from GCC

The next step we took in the evolution of dependencies in the build of c/c++ files was getting support from GCC. The developers of GCC added an ability, into GCC itself, to generate valid dependency files. These files were created by GCC with valid makefile content. They could be included in the regular makefile. So taking the previous example, we needed to modify our makefile using the following command:

gcc -c -MT $@ -MMD -MP -MF $*.d -o my_file.o my_file.c

We got a file my_file.d with the following output:

my_file.o: my_file.c my_file.hmy_file.h:

By adding this technique into the makefiles, we had automatic dependencies for all our C files. (For more info on using GCC + Make for this approach refer to http://make.mad-scientist.net/papers/advanced-auto-dependency-generation/.)

Drawbacks of GCC dependency generation

While this approach was a great step forward, it had many drawbacks, like:

  • Generation was built into the compiler. If you wanted to use clang instead of GCC, then you would most likely need some other tools. And what if you wanted MSVC or other compilers?
  • Added a lot of complexity into the makefiles. The makefile generated files are included by the same makefiles. This was a hard to track/debug flow.
  • Created very large makefiles. In large code bases, the resulting makefiles would get very large. This meant that make needed to work harder to parse them, resulting in slower builds.
  • Only worked for C/C++ files. If we wanted to have similar functionality for other languages, then the relevant compiler was required to support the same makefile generation technique.

To add a personal note here, many of the makefiles that I’ve seen don’t use this GCC feature properly. Usually it was because they were not trivial to use. While the initial author of the makefile may have had a deep understanding of the features and how to use them properly, the maintainers usually failed to do so. The makefiles grew more and more complex. Because of that, people usually avoided changing the build as much as they could. This led to the creation of positions like the “build engineer”, that is a dedicated employee that mastered makefiles and build systems. Just because the build got so complex.

Applying tup and umake

The next step was to apply the approach of tup and umake. The basic idea was very simple. Like in the original makefile example, we took a basic GCC command:

— gcc -c -o my_file.o my_file.c

But this time, we ran the build command, GCC in this case, in a controlled environment. This controlled environment could detect which files the command opened during the build. Tup uses fuse and umake uses strace. Basically, each time GCC opened a file, like my_file.h in this example, where the file access was registered into some database managed by the build tool. Thus, if GCC opened the file my_file.h, then umake learned it. Umake would then mark my_file.h as a dependency of my_file.c on its own, without adding anything to the umakefile. (If you haven’t done so already I highly recommend reading the documentation of the tup project.)

Drawback of tup and umake

This approach had one major drawback — build performance. Running the build command under a controlled environment that tracks file access also slows it down. This had noticeable results in large code bases, with a lot of files. But, and this is a big one, since we used strong caching in umake, the impact was negligible. So we enjoyed the benefit of automatic dependencies — at little to no cost. (For more info on automatic dependencies in umake refer to the docs.)

Using logical graphs to ensure a correct build

One of the most important jobs of any build system is to make sure that the resulting artifacts are valid. A very common issue with makefiles that are not written properly is a failure to handle parallel builds. The notorious make -jN, where N is the number of workers, fails a lot. The build system needs to know the exact order in which the artifacts should be built. When this task is performed in a manual fashion it tends to break down.

In many makefile based projects, this issue is handled by simply disabling parallel builds. Allowing only a single build command, e.g. GCC, to run at a time. This means that large machines can’t be fully utilized for building the projects.

This issue was solved in a very elegant way by tup. All the artifacts that should be built were formed into a logical graph. Each artifact was linked to the artifact that relies on it. We then had a graph with all the relations between the artifacts. By using topological sort, we ended up with an order of commands to execute. By using this graph, we checked if all the dependencies of a file were fulfilled. Assuming that they were, then we were able to safely build the artifact. This is true both for a sequential build and for a parallel build. umake works in a very similar way to tup. (For more info on the graphs and how they are used, I suggest you refer to the tup docs.)

Applying Benchmarks

Without a doubt the most common question that we got about umake was: how fast is it. This post has covered a lot about the fact that it was fast, but we didn’t provide a benchmark. We decided to do just that. We selected DPDK, which is built using Meson & Ninja.

We selected DPDK for the following reasons:

  • It’s a real world open source project, thus anyone can download the benchmark and run it.
  • A lot of work was invested into the performance of the DPDK build. Initially it was built using regular makefiles, similar to the build of the linux kernel. Then the build was replaced with Meson & Ninja for better maintainability and build times.
  • It is a relatively large code base with around 1600 files to be compiled as part of the build.
  • The build doesn’t use caching techniques at all, which is a very good way to show the huge performance impact of the cache in builds.

Before I jump to the numbers, I wanted to lay down a clear set of rules. What we compared and how we compared it. The DPDK build is performed in two stages:

  • Run Meson to generate Ninja build files.
  • Run Ninja using the generated files to build the required artifacts.

While umake was capable of doing many things, it was not meant to be a full replacement for tools like Meson. Another important difference between umake and Ninja was that umake DSL was expected to be written by humans, unlike Ninja. We also assumed that Meson doesn’t necessarily generate optimal Ninja build files. Meson, like most other generators that use Ninja, assumed that Ninja was just fast enough to handle those build files. Even if they were not optimal.

After taking all that into consideration we decided on the following rules:

  • The umakefile used in the benchmark should be derived from the Ninja files, not built manually and be very large.
  • The build time comparison shouldn’t include the Meson parsing & generation time.

Addressing the Caching Issue

We still had another element to address, caching. Since Ninja didn’t support caching, we could only compare two modes.

  • Null build, i.e building from scratch without any previous artifacts.
  • Incremental build, i.e run build command when no files should be built again.

The main strength of umake came to light when caching was in play, both local/remote cache. It is important to remember, that in a real life scenario, the cache would almost always be warm. In a large code base, it is very likely that someone else on the team already built a subset of the files in the project. This is especially true in CI builds. Since developers usually build the artifacts locally, before uploading to the server. Thus in CI builds, umake would almost always use 100% cached files.

The Results: A Faster, Optimized Build

So now, without further ado, here are our results:

+==================+==================+===============+=============
| Compilation | Time (seconds) | Command | Comments |
+==================+==================+===============+=============
| ninja | 160 | make ninja |
+------------------+------------------+-----------------------------
| ninja null build | 0.054 | make ninja |
+------------------+------------------+-----------------------------
| umake - uncached | 274 | make umake | [1] |
+------------------+------------------+---------------+-------------
| umake null build | 0.9 | make umake |
+------------------+------------------+---------------+-------------
| umake - | 9 | make umake-local |
| local cache |
+------------------+------------------+---------------+-------------
| umake - | 14 | make umake-remote | | remote cache |
| (over lan) |
+-------------------------------------+---------------+-------------
[1] strace has huge performance penalty

We saw that without using caching, Ninja was faster, a lot faster. This was due to the fact that Ninja was a well optimized C++ project. In the case of incremental build, Ninja finished the build even before umake finished all the module imports. The true power of umake came to light when the cache was in action (umake is written in python). When cache was available, we saw more than 10 times improvement in performance with umake as compared to Ninja.

(For the full benchmark refer to the official umake github repository at https://github.com/grisha85/umake/blob/master/doc/dpdk-build.md.)

--

--

DriveNets
The DriveNets TechBlog

DriveNets is the disaggregated networking software company.