Build and Testing Environment at Celonis: Our Experiences Over Time

This article has been previously published in the Celonis Dev Blog. Celonis with its “Intelligent Business Cloud” (IBC) is the leader in Enterprise Performance Acceleration software, harnessing the power of Process Mining technology to help organisations remove operational friction.

For 4 years, Celonis has been developing its own Query Engine for answering Process Mining queries. The requirements for build and testing infrastructure has evolved greatly since it started as a small C++ project on multiple platforms. Building and testing solutions are not as established in C++ as they are in other programming languages like Java, so over time we tried a lot of different strategies. In the following lines, we show our experiences grouped into the five categories: Build System, Dependency Management, Building, Testing and Deployment.

Build System

In the beginning, we used CMake for UNIX builds (Linux & macOS), but for Windows we used the Visual Studio project files directly. This had multiple reasons:

Firstly, the directory structure for source and header files had to be built manually, because Visual Studio did not detect it on its own. It usually resulted in a list of source and header files in the IDE. This was only made easier in CMake 3.8.

Secondly, importing existing VS project files is not straightforward. Many of the settings could not be translated easily to CMake instructions. An example of this is a multi-core compilation where it is not completely clear what the settings in the Visual Studio UI are really doing, and how to set them in the Visual Studio files, or invoke them later via MSBuild. Meticulously porting all of the compiler and dependency settings seemed not worth the effort.

When major changes in dependency and project structure required a ton of adjustments to the different build systems anyway, we finally merged both build system approaches to only use CMake.

Dependency Management

To address these issues pragmatically we switched to cloning dependencies and prebuilt binaries via shell script. Using this approach we lost the automatic synchronicity of the checked-in variant, because dependency versions at a specific commit in time were no longer strictly guaranteed to be the same versions.

The shell script would have had to clone a dependency at a specific commit. Referencing branches or tags was dangerous in the sense that they could have changed after the fact and would make reproducing older builds harder. Dependency Management like this was very similar to git submodules. In both cases the developer would have to execute another command (or at least adjust a clone command), but with git submodules the version handling of dependencies was strictly handled by git.

This is why we quickly moved to git submodules. Adding a dependency as a git submodule would add it at the specific commit HEAD was currently pointing at even if HEAD was a symbolic ref. This way, commits of the code are linked with commits of the dependency and should in general provide the possibility of reproducing older versions of the software.

For example:

$ git clone https://github.com/boostorg/boost.git
$ cd boost
$ git symbolic-ref HEAD
refs/heads/master
$ git show-ref refs/heads/master
3d189428991b0434aa1f2236d18dac1584e6ab84 refs/heads/master
$ cd ../test-repo
$ git submodule add https://github.com/boostorg/boost.git

If these changes now get committed and somebody else checks out the code at some other point they will see the following:

$ cd test-repo
$ git submodule update —init
$ cd boost
$ git symbolic-ref HEAD
fatal: ref HEAD is not a symbolic ref
$ git show-ref HEAD
3d189428991b0434aa1f2236d18dac1584e6ab84 refs/remotes/origin/HEAD

So we see that HEAD lost the information about a symbolic ref and only remembers 3d189428991b0434aa1f2236d18dac1584e6ab84.

While in theory a great guarantee, this mechanic had a rather bad adoption by developers. Updating dependencies was a process involving multiple steps and commits to different repos. Smarter git commands appeared to be either not very well documented or often a feature of later versions of git.

Additionally git submodules appeared to be a not very robust and popular feature at the time because of problems arising for us with CI and lack of tooling or support by git tooling for it. Switching branches also often resulted in problems with git submodules leading to an overall bad developer experience. Additionally there was a bottleneck in updating dependencies because it was usually only done by few. They also did not address the integration with the build systems which needed to be provided differently.

With all of these experiences in mind it seemed like a good idea to strive for a dependency management, that would combine the handling of versions, artifacts and building.To satisfy these requirements we switched to Conan. Conan is a python based C++ package manager that provides for all three requirements. Each package is described by a Python file containing all the metadata and instructions to build, test and deploy. Prebuilt packages can be deployed to an artifact repository such that developers and the continuous integration do not have to build it all the time. Conan also has a very mature CMake integration importing packages as CMake targets very easily.

Building

Continuously building relevant branches became more important. Because we used Atlassian products for repositories (Bitbucket Server) and Ticket Management (Jira), the obvious choice for a build server was the corresponding Atlassian product Bamboo. This has the benefit of builds being connected to branches and tickets being connected to builds (if branches are named correctly).

With increasing heterogeneity of required build environments for a rising number of projects that needed to be built by the Bamboo Server, using dockerized builds was necessary very quickly. Therefore we began to use the Docker Runner provided by Bamboo since version 6.4 extensively. While in the beginning only the main develop branch and various release branches were built, we are continuously extending the scope of branches that are built. Depending on the project this can be branches that have open pull requests or all branches pushed to the remote repo.

Testing

But not all portions can be targeted via queries very efficiently. The routines of data management, internal buffers or compression needed additional techniques to be tested thoroughly. We therefore introduced Catch (and later Catch2) as a C++ testing framework specifically to target these internal but extremely important parts of our software. Additionally we started to use coverage information by llvm-cov to determine the portion of code that we execute with our tests. This gives us an indication on where we need to expand testing.

Furthermore, tests are being executed using sanitizers every night. This allows us to rule out issues arising from undefined behavior or, specific address or threading conditions.

Deployment

For Linux each developer would just build the executable locally and could then use the Query Engine. For Windows and macOS this was not really feasible because especially before we had a build server, it was unfeasible for developers to set up the build environments for these platforms.

We therefore opted for checking the binaries into the repository. That way developers, which had to use the Query Engine on one of the two platforms in a different project only had to check out the latest code depending on whether a recent version of it was checked in. This resulted in a lot of binaries being committed to the repository very quickly. Times for cloning and other repository operations went up a lot. Because the Query Engine is used within a Java project most of the time, we switched to creating a Maven package for it. The platform specific C++ executables are packaged separately.

Conclusion

  • You should almost certainly use CMake
  • You should probably use Conan
  • At a team size of at least 3 people you should probably have some automatic builds in place
  • Failing tests should break the build.

In general, when we are changing things related to the build system, we try to keep two things in mind: Iterative changes and developer experience.

We only want to change things iteratively to keep the builds as stable as possible all the time. This gets more important the bigger the team gets and the greater the complexity of the software is.

Additionally, developer experience should be the main driver in the decision processes related to changes to the build system. Our experience with git submodules showed us that the introduction of technology that does not have a good developer experience takes a toll on the individual product ownership of developers as well as the agility of team as a whole.

Celonis Engineering

Thoughts, musings, and code snippets from the Celonis engineering team

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store