Dagger vs. the current state of CI/CD

Published in

datamindedbe

4 min readAug 11, 2022

Ever been in the position of having to develop a continuous integration / continuous delivery (CI/CD) pipeline, most likely written in YAML, from scratch? Did you end up getting frustrated by the slow feedback loops, the limited composability, or YAML itself (damn you, indentation mistakes!)? You are not alone! But… there is an alternative on the horizon. A few of the Docker originators have created Dagger, a portable development kit for CI/CD pipelines, with the goal of solving these issues.

In this three-part blog post, we will first zoom out a bit and look at the what and why of CI systems and their current pain points (part 1), then see how Dagger tries to alleviate those (part 2) and lastly write some custom Dagger composite actions (part 3).

What is (remote) CI/CD and why you need it

As developers, automation is at the core of what we do. Automation allows us to iterate quickly on whatever we are building. It gives us more confidence in our ability to deploy our software to production in a smooth and reliable manner.

One of the main tools in our belt that enables automation is a continuous integration / continuous delivery (CI/CD) system. The simplest of such systems is a build server. Whenever a change occurs in your version control system (VSC, e.g. Git, Mercurial, SVN, …), hooks trigger a build on the server. There are several advantages to such a remote system:

There is a centralized set of credentials for interacting with third-party services; not every developer requires a copy on their machine, which limits the attack surface for credential theft.
If set up correctly, there are no hidden dependencies (the classic ‘it works on my machine’ excuse); your build environment should be ephemeral, your server immutable.
It can be more suited for heavy lifting (try building Spark from source on your local machine)

Modern CI systems do all that and more, such as storing build artefacts, notifying you of any failures, or protecting branches from merges if certain conditions are not met. Better make those tests pass! A remote CI/CD system, however, comes with its own set of problems.

Struggles of current CI/CD platforms

Yet another YAML config file…

The most popular CI vendors such as Azure DevOps, Circle CI, GitHub Actions and Travis CI all use YAML as a configuration language. As such, it has become the de facto configuration standard. YAML certainly has its merits: it’s easy to digest and doesn’t have a steep learning curve. Non-programmers are often able to read and make changes to YAML files, without much assistance by a developer. But this ease-of-use also comes at a price: because whitespace has significance and the syntax only supports basic types, misconfiguration errors sporadically rear their ugly heads. It is difficult to catch these bugs without a linter, or some other kind of validation means. Moreover, YAML does not support basic control flow statements such a loops and conditions. Some CI vendors offer a custom workaround which extend their yaml configurations with basic control flow, but it often feels awkward and it’s near impossible to add and debug somewhat more involved logic to your pipeline.

Slow Feedback Loops

Having to test the pipeline remotely can be very time-consuming. You first have to commit and push your code to the CI system. The push triggers a pipeline run, which can take some time to spin up and get scheduled; if there is no queue this will be relatively fast, but if you’re unlucky or have an on-premise system that does not scale elastically (or both), you might have to wait a few minutes in queue each time you want to test your pipeline. Imagine having to test your code that way. Test-drive development (TDD) would be all but impossible.

Nevertheless, with some vendors, it is possible to run CI pipelines locally. There’s for example Act for Github Actions, CircleCI Local CLI, Gitlab Runner, … These solutions can speed up local development drastically, but there is often still a discrepancy between the local and remote runtime environment. In addition, these tools are specific to each vendor, which leads us to the next pain point…

Vendor lock-in

Once you have several pipelines setup at a specific vendor, it is often hard to switch to another. The process of rewriting and refactoring your pipelines can be expensive, and might not be worth the set of features that triggered the migration effort in the first place. And, let’s not kid ourselves, migrations do not spark joy. Everyone wants to build, no-one wants to maintain.

Standardization is a powerful antidote against vendor lock-in. The Browser Wars resulted in Web Standards, and Kubernetes has rapidly become for all means and purposes the standard for cloud-native applications. Standardization often accelerates innovation. Most vendors have some sort of notion of a (customizable) unit of work; GitHub has actions, CircleCI has orbs, Azure Devops has tasks, etc.. There’s a lot of reinventing the wheel there, which standards could prevent.

Lack of abstraction

Some CI platforms offer some kind of abstraction such as Azure DevOps Pipeline templates or GitHub composite Actions. These templates introduce some abstraction and reusable components are very welcome. However as your pipelines grow in size and complexity, so do your YAML templates. And managing many nested YAML templates can become very messy, very quickly due to the inherent problems of YAML.

If you recognize these pains, and are intrigued by the promise of a better alternative, keep reading our part 2 of this series, where we take a closer look at Dagger, a portable development kit for CI/CD pipelines.

At Data Minded we try to keep up with the latest tools and technologies in the data world. Interested to learn more about the kind of work we do together with our clients? Head over to our website!

By Jan Vanbuel & Hannes De Smet