Simplify Your CI Pipeline Configuration with Jsonnet

Shing Lyu
Shing Lyu
Mar 1 · 9 min read

Most of the CI/CD (Continuous Integration/Continuous Delivery) tools nowadays supports some form of configuration file so you can properly version control them. For example Travis CI, Gitlab CI, Circle CI and Drone CI uses YAML file. Jenkins uses its own DSL. These YAML-based configuration files are easy to read and edit, but they don’t scale very well when the file grows big. This problem can be solved by using a nice data templating language called Jsonnet. In this post we’ll be demonstrating Drone CI v1.0 configuration file format, but the idea can be easily applied to other CI tool.

The problem with YAML-based CI configuration files

The first problem is that pipelines become hard to reason about when you have more and more conditional builds. Usually when we are using git with CI pipelines, we end up with multiple pipelines for each scenario. For example, imagine we have an imaginary Node.js web service, when I do a feature branch push (i.e. non-master branch push) we would want to trigger the build and unit test steps. When the pull request is approved and we merge the branch to the master branch, we want it to build, unit test, deploy to our dev environment and then run integration test on it; Once we are done with testing in dev environment, we can use Drone CI’s CLI to trigger a deployment to stage, which will take the build from the previous master branch build and deploy it to the ‘stage’ environment, then run integration test on it. The same can be applied for deployment to production, but we’ll leave it out to keep the example simple. So to summarize, we ant the following pipelines:

Drone and many other CI solution allow you to achieve this with some conditions. A Drone config file for the above pipelines will look like this:

The event: promote in the deploy_stage is triggered by a CLI call drone promote <repo/name> <build> <environment>. This is how manual deployment is triggered in Drone. Don’t worry if you don’t understand how this works, it’s not critical to our discussion.

Now imaging you are new to the project and read this drone pipeline, what would happen when you push a feature branch? First you’ll have to read through all the steps. For each step you’ll need to check if the when conditions matches the scenario you care about. Then you need to write down all the steps that matched. Be careful that a step without any when condition will run in every situation. So you’ll need to do a lot of processing in your head to see what will be run when. It’s also very easy to add a new step with the wrong condition and have it run in an unexpected situation. The pipeline configuration we just created is basically a tree, and we apply conditions onto it to get a branch of it.

But it will be much simpler if we duplicate the build and test steps and enumerate every combination with when blocks. But this way we’ll end up with 8 steps, each with different when condition, while most of the code is duplicated. We’ll solve this with jsonnet after we explain the second problem.

The second problem is code duplication. YAML provides anchor to cut down on repetition. But that only works at key-value granularity. A simple YAML anchor looks like this:

In this example, we defined an anchor called &anchor_job, which contains two keys, job: programmer and duty: code and debug. In our employees list, we use <<: *anchor_job to in-line it into the name: Alice object. The keys from &anchor_job will be merged into the name: Alice object and become

However this mechanism only works at key-value level, you can’t parameterize part of a value. Let’s assume that we are going to deploy the imaginary service to multiple AWS regions for resilience, we’ll have even more combinations. If we have 3 environments, dev, stage and prod (production), and 2 regions, ‘eu-central-1’ and ‘us-west-1’, then we’ll have 3 x 2 = 6 deployment combinations. Even if we use YAML anchor to avoid repeating the when part, we still repeat a lot of the code:

A YAML anchor

Notice that even if we reduce the repetition by when_deploy_stage for both the stage part, we can’t abstract out the npm run deploy -- --env=<environment> --region=<region> line and the name, because we can’t parameterize the environment and region bit within the line. The good news is that Jsonnet can solve both problem we discussed. We’ll give a short introduction about Jsonnet and explain how we can solve the problems with Jsonnet.

Jsonnet

Jsonnet is an open source templating language based on JSON. The backbone of it is still native json, but it adds variables, conditionals, functions, arithmetics and more to it. It also has nice linter, formatter, and IDE integrations. It has a nicely designed standard library that provides you utilities for string manipulation, math, and functional tools like map and fold.

A jsonnet source code pass through the compiler, which emits JSON. On MacOS you can easily install it with brew install jsonnet. Although Drone now natively supports jsonnet, but since my team still runs the old version of Drone, we decided to compile jsonnet to JSON, then use the json2yaml tool to convert it to YAML. We then commit both the jsonnet source and the generated YAML file into our git repository.

So let’s try to solve the repetition problem by using jsonnet functions. The moving parts in our deploy step is the environment and region. So we can define a function that takes the two parameters and do string interpolation in there:

Let’s take a closer look to the name field. Jsonnet supports old Python-like string formatting (the % operator). In the template string, the %(env)s will search for the env key in the object following the % operator. The s at the end of the %(...)means we want to format it as a string.

If we run jsonnet demo1.jsonnet, this will be printed to the STDOUT:

We generated 64 lines of json from just 23 lines of jsonnet, and it’s much easier to read!

The next question is how can we structure our jsonnet code such that we can easily understand what steps are included in each scenario (e.g. push to non-master, merge to master etc.). We can first define the building blocks, the steps:

Then we can start composing our pipelines with these steps. First we define a list of steps we want when pushing to a non-master branch:

We want to restrict these steps to only run on a push to non-master, we can use a std.map to add the conditional block (i.e. when block) to each step of it.

The whenCommitToNonMaster function will append the when block to the step you pass in. The syntax step { when: ... } actually means “merging the stepobject with the { when: ... } object”. This function is then applied to each and every step using the std.map function. This pattern can then be applied to other scenarios, for example when we merge to master:

We choose to repeat the build and unitTest step here, so we can clearly see what is included in the “merge to master” pipeline. In the generated code there will be two copies of the build step, one with a when block of pushing to non master and another with a when block of merging to master; same for the unitTest step. We can carry on with defining other scenarios and their list of steps. In the end we’ll have a list of scenarios, each contains a list of steps. We can then flatten all the lists into one giant list of all possible steps using the std.flattenArrays() function.

Using this architecture, anyone who reads the Drone configuration can clearly see the list of scenarios (the pipelines list). To see what steps are executed in each scenario, we can simply go to the definition of the scenario variable (e.g. commitToNonMasterSteps).

A side note about Jsonnet vs. JavaScript

You might wonder why we choose to use Jsonnet instead of JavaScript. You can easily achieve the same effect by forming the config object in JavaScript and print it out with JSON.stringify(). One reason is that Jsonnet is natively supported by Drone CI since v1.0, so it make sense to use it directly. Another reason is that Jsonnet’s syntax is built around native JSON, with a relatively limited set of function and operators. So it will force you to focus on the data rather then the algorithm. By using JavaScript you might be tempted to use all sorts of NPM libraries and write complex algorithms that makes it hard to trace and debug. The design of it also leads you to write very functional code instead of procedural code, so if you are into functional programming it will be a natural fit. But technically Jsonnet is no better then using plain JavaScript, so feel free to choose whichever fits into your existing pipeline and team’s expertise.

Conclusions

We discussed the problems for writing the CI pipeline in plain YAML. The first problem is that if we use complex conditionals to control which step to run in which scenario, the pipeline will quickly become hard to reason about. The second problem is that even if we use YAML anchors we still can’t eliminate all the repetitions. By using jsonnet, we can solve the two problems. We can eliminate the repetition using jsonnet functions and string interpolation. To address the complex conditionals problem, we structure our jsonnet code in a way that we enumerate all the steps under each scenario. Thanks to the jsonnet templating, we can be explicit but keep our code concise and clean.

DAZN Engineering

Revolutionising the sport industry

Shing Lyu

Written by

Shing Lyu

DAZN Engineering

Revolutionising the sport industry