Will Dagger revolutionize CI/CD?

Jan Vanbuel
datamindedbe
Published in
9 min readOct 6, 2022

In our previous blog post, we briefly pointed out some of the pain points of CI/CD systems. This second part of our series on Dagger vs. the current state of CI/CD introduces and explores Dagger, a new tool that promises to take away (part of) those pains.

What is Dagger?

Dagger is a CI/CD tool built on top of BuildKit, that makes use of the CUE language for writing and validating CI/CD pipelines. It tries to improve the developer experience of writing such pipelines, by closing the gap between local and remote CI runs. You can build powerful CI/CD pipelines quickly and run them anywhere. It is possible to test and debug them locally while avoiding CI vendor lock-in. The main enabler? BuildKit!

BuildKit?

BuildKit is a container build tool maintained by the Moby project and, at the time of writing, the default build engine of Docker. At the core of BuildKit lies a binary intermediate format called LLB*, short for "low-level builder". From the tl;dr of the official GitHub repo: "LLB is to Dockerfile what LLVM IR is to C." It defines the dependency graph for the different build steps, and the BuildKit engine then finds an optimal execution path. To avoid repeating the same steps over and over after an intermediate failure, it maintains a build cache. The result is the execution speed and caching that we're used to from Docker. The key insight from the Dagger creators is that BuildKit's graph execution model can be applied to, well, any directed acyclic graph (DAG). And what are CI/CD pipelines, if not DAGs?

Image courtesy of dagger.io

As the name suggests, LLB is quite a low-level language. We don't write our container build definitions in LLB; we write them in Dockerfiles. Docker translates your Dockerfile to LLB, which then gets executed by the BuildKit engine. Similarly, Dagger uses a Dagger file front-end that gets converted to LLB. But instead of crafting their own configuration language, the Dagger team decided to use an existing one: CUE.

CUE?

Configure. Unify. Execute. CUE is a young configuration language aimed at constraining and validating data. It looks and feels a bit like a less-verbose, typed cousin of JSON, or a less error-prone, but still easy-to-read version of YAML. A detailed overview of the CUE language is outside the scope of this blog post, but a solid introduction can be found here or in the official docs. We’ll explain the bits of CUE used in this blog post as and when we need them.

In short, Dagger CI/CD pipelines are DAGs defined in CUE, which get translated to LLB and executed by BuildKit. To makes this a bit more tangible, let’s have a look at a short example, shall we?

Taking it for a spin: Terraform example

We’re going to get our hands dirty by writing a Dagger pipeline to deploy infrastructure on AWS with Terraform. All the example code of this section can be found in our GitHub repository dagger-exploration. If you haven’t installed the dagger CLI yet, follow the instructions in the official Dagger docs. If you're on macOS or Linux and use the Homebrew package manager, you can simply run brew install dagger.

To start with a new Dagger pipeline, you first need to initialize a project in an empty directory by executing the following command in your favorite shell:

dagger project init

This generates a cue.mod folder, which will be mostly empty. Next, create a new file. You can name it anything you want, e.g. terraform.cue. Do make sure that it has the .cue extension. Although the CUE language does not have a specified extension, Dagger only interprets files with that extension. Now copy-paste the following lines of CUE into your file:

// terraform.cuepackage terraformimport (
"dagger.io/dagger"
"dagger.io/dagger/core"
"universe.dagger.io/x/ezequiel@foncubierta.com/terraform"
)

This defines a package namespace terraform and the imports that are necessary for developing a Terraform deployment pipeline with Dagger**.

Next, run dagger project update. This will save all dependencies referred to in your pipeline definition in the cue.mod folder, under pkg. Under the import section, copy-paste the following lines:

// terraform.cuedagger.#Plan & {
actions:
tfSource: core.#Source & {
path: "./infra"
}
init: terraform.#Init & {
source: tfSource.output
}
}

Before delving a bit more into the syntax of this snippet of CUE, let’s analyze the bits specific to Dagger. All Dagger pipelines require an instance of a dagger.#Plan (in a bit, we'll talk about the meaning of the # and & characters and other CUE syntax elements), and the plan struct should contain a set of actions. An action is some kind of abstraction built from other actions built from other actions… If you go down the rabbit hole, all dagger actions evaluate (= are joined together) into a stew of core actions, such as reading and writing to/from a file (including sockets), encrypting/decrypting a secret, building and running containers, …. It is precisely this set of “atomic” actions that the Dagger engine translates into LLB, ready to be gobbled down by BuildKit.

Each of the actions is a step in your pipeline, that potentially depend on other steps. If you’re familiar with Makefiles, you could also think of them as ‘targets’, as they play the same role. You can run them locally by executing dagger do {action}. The snippet above contains two actions: tfSource and init. The first action is an instance of core.#Source and takes as input a local path named infra. The second action, init, takes as input the output of the first action, which are the contents of that folder.

If you create an empty infra directory next to your CUE pipeline definition, you could already give it a try and run dagger do init. This won't do much, as we haven't written any Terraform configuration yet.

Intermezzo: CUE

Let’s talk a bit about CUE syntax. The # in CUE denotes a definition. A definition is used to validate some configuration input. You could think of a definition as a type, but that would be a half-truth, as there is no distinction between types and values in CUE. Informally, a definition is an underspecified value. A definition could for example be:

#MyDefinition: {
a: int
b: string
}

The & character is called a "join", and merges two values. You could think of a CUE definition as a blueprint (a bit like a class, or the builder pattern in object-oriented programming) or a function, that you can feed inputs via the join operator.

In the snippet of the previous subsection, there are three definitions: dagger.#Plan, core.#Source, and terraform.#Init. In case you didn't guess it yet, the namespace prefixes indeed indicate we're importing the definitions from the packages in the import section, i.e. dagger, core and terraform.

We all need validation

Let's continue with our Terraform pipeline example. First, we'll define some infrastructure in Terraform's HCL language by creating an s3.tf file in the infra directory with the contents:

// infra/s3.tfresource "aws_s3_bucket" "dagger_bucket" {
bucket = "example-dagger-bucket-xyz"
}

This creates an S3 bucket with the name "example-dagger-bucket-xyz". In order for this code to work, you'll want to change the bucket argument for a unique string of your choice.

We can check our infrastructure snippet for mistakes by adding a validate action to our plan:

// terraform.cuedagger.#Plan & {
actions:
tfSource: core.#Source & {
path: "./infra"
}
init: terraform.#Init & {
source: tfSource.output
}
validate: terraform.#Validate & {
source: init.output
}

}

You can check that our HCL code contains no mistakes by running dagger do validate. If there are no errors printed to stdout, we're good to go and deploy our infrastructure!

Confidential input

Now that we have defined our infrastructure, we need to pass a set of AWS credentials to our Dagger pipeline in order to authorize it to deploy infrastructure in our stead. More generally, most CI/CD pipelines need some sort of credentials to communicate with protected services that are included in the pipeline definition.

Dagger supports three ways of incorporating secrets: via environment variables, via files, and via the output of other actions. For those first two variations, dagger has the notion of a client, which it uses to fetch those type of credentials. A client can be either remote or local, and can expose next to files and environment variables also sockets (which are useful if you want to interact with e.g. a Docker daemon).

Let's keep it simple and configure our AWS credentials as environment variables, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. To instruct Dagger to fetch these credentials, we define a client section as part of our plan:

// terraform.cuedagger.#Plan & {
client: env: {
AWS_ACCESS_KEY_ID: dagger.#Secret
AWS_SECRET_ACCESS_KEY: dagger.#Secret
AWS_REGION: dagger.#Secret | "eu-west-1"
}

...
}

This tells Dagger that it should browse the client's environment variables for those three secrets, with AWS_REGION being optional; if it's not configured, Dagger will use the default value eu-west-1. Note that these secrets are instances of dagger.#Secret. This means Dagger knows that these values should be treated with care, and will not end up sprinkled around in the logs.

Finally, we configure a terraform.#Apply action with those environment variables:

// terraform.cuedagger.#Plan & {
client: env: {
AWS_ACCESS_KEY_ID: dagger.#Secret
AWS_SECRET_ACCESS_KEY: dagger.#Secret
AWS_REGION: dagger.#Secret | "eu-west-1"
}
actions: {
tfSource: core.#Source & {
path: "./infra"
}
init: terraform.#Init & {
source: tfSource.output
}
validate: terraform.#Validate & {
source: init.output
}
apply: terraform.#Apply & {
source: validate.output
env: {
AWS_SECRET_ACCESS_KEY: client.env.AWS_SECRET_ACCESS_KEY
AWS_ACCESS_KEY_ID: client.env.AWS_ACCESS_KEY_ID
AWS_REGION: client.env.AWS_REGION
}
}

}
}

Et voilà, that's it! Running dagger do apply locally should create an S3 bucket in our AWS account.

Integrating with your CI/CD provider of choice

After developing and testing your pipeline locally, you'll want to run it on a remote CI/CD platform. Dagger can run on any provider that supports containerized tasks, and the Dagger team itself has written several copypasta configurations for the most popular CI/CD providers, such as GitHub, GitLab, CircleCI, etc. It's quite easy to set up, and the result is a uniform set of pipelines.

Will Dagger revolutionize CI/CD?

So far, we avoided answering the question that is the click-bait title of our post (sorry 😬). And too be honest, it's probably still too early to give a definitive answer to this question (again, sorry 😬). But here's an overview of some of the things we like/don't like about Dagger:

Hell yes!

  • Dagger is fast. The caching feature allows you to iterate quickly and debug your pipelines in no time. And you can be confident that they work the same on your local machine and on a remote CI server.
  • We ❤ CUE. There's a lot to love about the language: it adds robustness to your configuration, giving you more confidence that your pipeline is configured correctly, while at the same time keeping it DRY. It allows you to create powerful abstractions which feel more elegant than existing JSON/YAML template-based solutions. In addition, there is now also a CUE language server, open-sourced by the Dagger team. Features such as autocomplete and jump-to-definition can be used in your IDE of choice, which further improve the developer experience of writing CI/CD pipelines.
  • Dagger is community-driven. A lot of CI vendors rely on the community to develop custom integrations for their platform. If Dagger succeeds in consolidating those efforts into a single ecosystem, a lot of development time can be put to more productive use.

Maybe not…

  • CUE is a relatively new and therefore not (yet) widely adopted language; there is certainly a learning curve to start using Dagger. For a lot of pipelines, it will be faster to write a Makefile that you can run both locally and remotely. Alternatively, you could use Earthfiles, which are still very readable (a mix between Make and Dockerfile syntax), and which also leverage the speed and caching capability of BuildKit. You just won't be able to create abstractions.
  • In your remote configuration, you typically run a single (or a few) action(s). These actions depend on other actions, but these are not specified in the remote configuration. This means that the visualization of your pipeline in the UI of your CI system gets kind of borked. You can "recover" this functionality by chopping up your pipeline in smaller pieces, but then you tie yourself again to a particular CI system. It is likely that Dagger will at some point come with their own CI offering that solves all of that and more.
  • You'll have to relinquish some of the features of your favorite CI platform. For example: with Azure DevOps, you can create the pipeline via a web UI. In this web UI, you start from a template, and you are presented with a list of auto-fetched parameters for configuring the template (e.g. name of a storage account, service principal, credential stores, …). With Dagger, you'll have to look up those values yourself.

Feel free to let us know if you think we missed some important arguments, either pro or contra. In the third part of this series, we'll dive a bite deeper into the Dagger pipeline spec and write a custom Dagger definition for a command-line-interface (CLI) tool.

By Hannes De Smet and Jan Vanbuel

*If you want to learn more about LLB you can check out a post by Tonis Tiigi.

**If you think this looks a lot like Go code, you’re not mistaken. The creator of CUE, Marcel Van Lohuizen, was for years a member of the Go team at Google.

--

--