Supercharge Jenkins for monorepos: Multi-multibranch Pipelines

How we extended Jenkins to make using Multibranch Pipelines in monorepos a piece of cake.

Alexis Gauthiez
BlaBlaCar
6 min readSep 1, 2021

--

Jenkins is a central piece of BlaBlaCar’s Software Factory. On an average day, we trigger over a thousand builds to test and deliver dozens of services and applications, configure our infrastructure, and do much more.

The most trivial way to integrate Jenkins with a version control service is to use so-called organization folders with Pipelines and Jenkinsfiles. Jenkins then automatically discovers Git repositories and creates Multibranch Pipeline projects for each, which, in turn, creates Pipelines for each branch. Ultimately, a branch’s Jenkinsfile describes the Pipeline for that branch.

Fundamentally, there can be only one Jenkinsfile, and therefore one Pipeline, per Git repository branch. That can be limiting when dealing with large repositories, like monorepos, where Pipelines can become pretty bloated. Wouldn’t it be convenient to have multiple Jenkinsfiles per repository, allowing one to build each repository piece independently? At BlaBlaCar, we have developed an in-house solution to do just this by taking advantage of many built-in Jenkins features, some of which you may not know about.

Monorepos at BlaBlaCar

Monorepos can offer numerous benefits over traditional repositories in specific situations: it eases code reuse, standardization, consistency, collaboration, etc. These benefits motivated some of our teams to centralize most of their work in a single repository.

The same benefits apply to multi-tenant repositories. For instance, we manage all Terraform configuration in a single repository where each of our 350+ root modules is stored in a dedicated directory.

In both cases, testing, building and deploying all repository artifacts when changes only concern a few sub-projects is inefficient. Improving a single Pipeline to conditionally skip specific steps is achievable, though at the cost of a higher pipeline complexity and an irrelevant Pipeline execution history. That is where a multi-pipeline pattern comes in handy.

From mono-multi to multi-multi

The idea is to organize multiple Multibranch Pipelines in a way that mirrors the repository file structure. First, let’s define Jenkinsfiles in our monorepo:

my-monorepo> tree -P ‘Jenkinsfile’ — prune
.
├── Jenkinsfile
└── foo
├── bar
│ └── baz
│ └── Jenkinsfile
└── qux
└── Jenkinsfile

For each non-root Jenkinsfile in a Git repository, the goal is to create a corresponding Multibranch Pipeline, using the path to the Jenkinsfile as the relative location of the item on Jenkins.

The Job DSL to the rescue

One can create Multibranch Pipelines through Jenkins web UI, but that is not viable as it does not scale. Fortunately there is a programmable interface for that: the Job DSL. Jenkins Job DSL offers a Groovy API to create and configure Jenkins items, such as Pipelines and folders. With some scripting efforts, we can look for the repository’s Jenkinsfiles, and provision Multibranch Pipelines accordingly.

A root folder, let’s name it “Generated”, serves as an entry point for all generated items. A child folder will be created for the repository, and subfolders according to respective Jenkinsfiles location.

📁 Generated
└── 📁 my-monorepo
└── 📁 foo
├── 📁 bar
│ └── 🗃 baz
│ ├── ⚙️ main
│ └── ⚙️ feature-branch
└── 🗃 qux
├── ⚙️ main
└── ⚙️ feature-branch

Ancestor folders need to be provisioned before their children, otherwise the folder method of the Job DSL throws an exception.

Multibranch Pipelines are created as leaves of the folders tree.

The root Pipeline of the repository is used as a seed to execute the job DSL script so pipelines are automatically created and configured as Jenkinsfiles are added to the repository.

And a few seconds later, tada! 🎉

Processing provided DSL script
Added items:
GeneratedJob{name=’Generated’}
GeneratedJob{name=’Generated/my-monorepo’}
GeneratedJob{name=’Generated/my-monorepo/foo’}
GeneratedJob{name=’Generated/my-monorepo/foo/bar’}
GeneratedJob{name=’Generated/my-monorepo/foo/qux’}
GeneratedJob{name=’Generated/my-monorepo/foo/bar/baz’}

The job DSL is a powerful extension to Jenkins. We use it to configure our organization folders via the Jenkins Configuration as Code Plugin.

Detecting changes to ultimately trigger downstream builds

Now that Multibranch Pipelines are provisioned, let’s trigger downstream builds from our seed build.

The strategy is to trigger a build for each modified directory containing a Jenkinsfile. For instance, if foo/bar/baz/file is modified, we want to build Generated/my-monorepo/foo/bar.

Wrapping everything up in a shared library

Previously introduced functions can be declared and called directly in any Jenkinsfile, though they become much more presentable when distributed via a Shared Library. Such a library can expose a new monorepo step that performs all of the above.

At BlaBlaCar, we maintain our own private Shared Library, which contains more than 20 custom steps. It enables us to industrialize common pipeline patterns and make them mainstream. The library is tested with the Jenkins Pipeline Unit testing framework and implicitly loaded for any pipeline to benefit.

Handy but not perfect

Creating items programmatically on Jenkins can go out of hand if not supervised correctly. As engineers will be tempted to use this technique, your Jenkins instance will manage more Pipelines. Since the number of items can directly impact Jenkins performance, it is important to keep it under control.

Don’t forget about orphans

Multibranch Pipelines automatically create Pipelines for individual Git branches, but they can also remove them according to the “orphan item strategy”. It is reasonably configured on the provided snippets, but it is only effective if unused branches are deleted to turn Pipelines into so-called orphans. Some version control systems (e.g. GitHub) can be configured to automatically remove head branches after a pull request is merged.

The implementation presented here does not include any “garbage collection” mechanism that would remove items that are no longer referenced by the source repository. There are a couple of options here. You can rely on the jobDsl step’s removedJobAction option as long as you can cope with its limitations. Alternatively, generated Jenkins items can be removed periodically, which ensures only the ones that are actively being provisioned remain.

Take care of your Git repository source

As we moved our Git repositories over to GitHub, we discovered a bigger problem: we were consuming our GitHub API requests quota on a weekly basis.

Depending on your Pipelines branch source configuration, pushing a commit to a single Git repository leads to Jenkins dispatching the corresponding event to all its Multibranch Pipelines. Upon receiving this event, each Multibranch Pipeline independently indexes branches accordingly. As a result, this operation costs a few API requests per Multibranch Pipeline. When dealing with repositories including hundreds of Jenkinsfiles, it becomes an issue.

If you are using GitHub, you can switch to an alternative integration method, e.g. GitHub Apps which allows up to 15k HTTP requests per hour. We had to choose a more radical solution: we switched to the generic Git branch source so that Multibranch Pipelines do not handle Git events.

The generic Git branch source relies on a provider-agnostic SSH connection, therefore it is unaware of the actual Git management system vendor. Downstream builds won’t notify your Git repository provider with build statuses anymore. As long as the seed build triggers downstreams with propagate and wait options set to true, it will notify on their behalf, so it is not a blocker. Make sure your seed build triggers downstream Multibranch Pipelines scans to replace the automatic branch indexing that usually occurs. This solution is basically quota-free.

Test and monitor

Having these considerations in mind, properly monitoring both your Jenkins instance and Git repository management system is key. Using Jenkins’ metrics plugin to keep an eye on the number of Pipelines and automatically alerting on our GitHub rate limit status have proven to be useful on our side. Also, creating a staging environment to experiment with your configuration is strongly advised.

“Finished: SUCCESS”

The multi-pipeline pattern is a neat approach to deal with rather big Git repositories. We have been using it for about 3 years at BlaBlaCar and it has helped us to set the CI up for our monorepos in a scalable and flexible fashion. This is made possible by Jenkins extensibility. Feel free to bootstrap your very own library by introducing a monorepo step tailored to your team’s needs!

--

--