Speeding up CI in Kubernetes with Docker and Buildkit

Published in

VoucherCodes Tech Blog

7 min readOct 30, 2023

On the VoucherCodes Platform Team we’ve recently been looking at how we can improve developer experience and all-round efficiency. One of the most frequent points of feedback was that CI (Continuous Integration) can be frustratingly slow, especially around Docker image builds.

How do we run CI at VoucherCodes?

Our CI infrastructure is fairly standard for a predominantly Kubernetes-based environment. We use Gitlab for our code repositories and CI, and we host our runners on Kubernetes.

The jobs themselves run as Kubernetes pods - with Docker-in-Docker running as a Gitlab CI service/sidecar - and we point the Docker CLI in our jobs to reference the sidecar container.

To reference Gitlab’s documentation, a pipeline definition would look something like this:

default:
  image: docker:24.0.5
  services:
    - docker:24.0.5-dind
  before_script:
    - docker info

variables:
  DOCKER_HOST: tcp://docker:2375
  DOCKER_TLS_CERTDIR: ""

build:
  stage: build
  script:
    - docker build -t my-docker-image .
    - docker run my-docker-image /script/to/run/tests

With the above setup in place, Docker commands are sent to a local Docker instance. This means we don’t need to do the somewhat unsafe things we had to do many years ago - like mounting a Docker socket file from the underlying host, for example.

However, there are disadvantages to this approach. The Docker instance only exists for the duration of the CI job, so the default local build cache is completely empty. With no other configuration, every job would have to build images from scratch and depending on your application, language and other dependencies, this can be extremely slow.

There are ways to improve this, such as using an inline build cache where the cache is embedded in the resulting image for reuse by subsequent jobs, or a registry cache where you push your cache to a separate location. You can also pre-pull images to use as a cache. However, with these options, your cache is still effectively remote, which can lead to slow cache lookups and retrieval, as well as increased network costs.

We also run multi-architecture builds. To complete these quickly (i.e., without QEMU emulation), we execute these jobs on Kubernetes nodes matching the respective architecture. This incurs additional costs in both compute and general administrative overhead.

Quick wins for maximum benefit

We investigated various approaches to improve our build times and cache hit rates. However, due to the intricacies of our pipelines and the dependencies within, we were unable to make significant improvements by optimising the remote caching strategies we had available to us.

We are a small team here at VoucherCodes and it was simply not reasonable to undertake a full rewrite of our pipelines, so we needed to look for solutions that would give us the right balance of effort and benefit.

Investigating Buildkit

Buildkit is, as the name suggests, the component within the Docker engine that conducts the build itself. It manages the build cache, which as I mentioned previously will be completely empty in our ephemeral CI setup. However, it is possible to run a remote Buildkit instance and point your Docker CLI to use that, rather than the local builder.

With this in mind, we had the idea that we could run stateful, permanent instances of Buildkit within our Kubernetes cluster, and simply point our ephemeral Gitlab CI jobs to use these. The local build cache would not be lost on every job, and if sized correctly we could gain compute efficiencies by running fewer (but larger) Buildkit instances, allowing the actual Gitlab CI jobs to execute with fewer resources.

Deploying a Buildkit Farm

Deploying Buildkit to Kubernetes is not a particularly common use case, so documentation is somewhat sparse. However, it turned out to be quite a simple task and the Buildkit repository has fairly clear examples. Using those as a starter, we made only minor changes and additions to deploy our Buildkit farm.

First of all, we needed to create certificates for authentication between the Docker Buildx clients and Buildkit. We already use Cert Manager throughout our Kubernetes clusters, so this was a simple task to create certificates using a self-signed issuer (unimportant fields removed for brevity):

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: buildkitd-tls
...
spec:
  secretName: buildkitd-tls
...
  dnsNames:
    - buildkitd-arm.buildkit
    - buildkitd-arm.buildkit.svc
    - buildkitd-arm.buildkit.svc.cluster.local
    - buildkitd-arm.buildkit.svc.cluster.local
    - buildkitd-amd64.buildkit
    - buildkitd-amd64.buildkit.svc
    - buildkitd-amd64.buildkit.svc.cluster.local
    - buildkitd-amd64.buildkit.svc.cluster.local
...

We then create a config to mount on the Buildkit instances

apiVersion: v1
kind: ConfigMap
metadata:
  name: buildkitd-config
  namespace: buildkit
data:
  buildkitd.toml: |
    debug = true

    # config for build history API that stores information about completed build commands
    [history]
      # maxAge is the maximum age of history entries to keep, in seconds.
      maxAge = 345600
      # maxEntries is the maximum number of history entries to keep.
      maxEntries = 100

    [worker.oci]
      gc = true
      # gckeepstorage sets storage limit for default gc profile, in MB.
      gckeepstorage = 200000
      # maintain a pool of reusable CNI network namespaces to amortize the overhead
      # of allocating and releasing the namespaces
      cniPoolSize = 16

      [[worker.oci.gcpolicy]]
        keepBytes = "200GB"
        keepDuration = "96h"
        filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout"]
      [[worker.oci.gcpolicy]]
        all = true
        keepBytes = "200GB"

    [worker.containerd]
      gc = true
      # gckeepstorage sets storage limit for default gc profile, in MB.
      gckeepstorage = 200000
      # maintain a pool of reusable CNI network namespaces to amortize the overhead
      # of allocating and releasing the namespaces
      cniPoolSize = 16

      [[worker.containerd.gcpolicy]]
        keepBytes = "200GB"
        keepDuration = "96h"
        filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout"]
      [[worker.containerd.gcpolicy]]
        all = true
        keepBytes = "200GB"

Configuration is one of the areas where documentation is lacking, so we had to make some assumptions with regard to some of the meaning and expected behaviour.

We were not too sure on the ideal sizing, but storage is cheap and our build pipelines contain a lot of dependencies, so we err on the side of caution and allocate 200GB to the cache before GC kicks in, with a retention window of 96 hours.

Deploying the actual instances, we created two stateful sets — one for amd64 and the other for arm, and we select appropriate nodes with a node selector (only one config shown):

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: buildkitd-arm
  name: buildkitd-arm
spec:
  serviceName: buildkitd-arm
  podManagementPolicy: Parallel
  replicas: 2
  selector:
    matchLabels:
      app: buildkitd-arm
  template:
    metadata:
      labels:
        app: buildkitd-arm
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    spec:
      tolerations:
        - key: buildkit
          operator: Exists
          effect: NoSchedule
      nodeSelector:
        buildkit: exclusive
        beta.kubernetes.io/arch: arm64    
      containers:
        - name: buildkitd-arm
          image: moby/buildkit:master
          args:
            - --addr
            - unix:///run/buildkit/buildkitd.sock
            - --addr
            - tcp://0.0.0.0:1234
            - --tlscacert
            - /certs/ca.crt
            - --tlscert
            - /certs/tls.crt
            - --tlskey
            - /certs/tls.key          
          readinessProbe:
            exec:
              command:
                - buildctl
                - debug
                - workers
            initialDelaySeconds: 5
            periodSeconds: 30
          livenessProbe:
            exec:
              command:
                - buildctl
                - debug
                - workers
            initialDelaySeconds: 5
            periodSeconds: 30
          securityContext:
            privileged: true
          volumeMounts:
            - name: config
              readOnly: true
              mountPath: /etc/buildkit/buildkitd.toml
              subPath: buildkitd.toml
            - name: certs
              readOnly: true
              mountPath: /certs
            - name: cache
              mountPath: /var/lib/buildkit
          resources:
            requests:
              cpu: 3
              memory: 8Gi
            limits:
              cpu: 8
              memory: 30Gi
      volumes:
        - name: config
          configMap:
            name: buildkitd-config
            items:
              - key: buildkitd.toml
                path: buildkitd.toml
        - name: certs
          secret:
            secretName: buildkitd-tls
  volumeClaimTemplates:
  - metadata:
      name: cache
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: "ebs-sc"
      resources:
        requests:
          storage: 250Gi

We then just need to expose them with an internal service (again, only one config shown)

apiVersion: v1
kind: Service
metadata:
  labels:
    app: buildkitd-arm
  name: buildkitd-arm
  namespace: buildkit
spec:
  ports:
    - port: 1234
      protocol: TCP
  selector:
    app: buildkitd-arm

Using the new Buildkit farm

In order to use the new Buildkit farm, before using any docker build commands, you would simply need to register the remote builder and enable it for use:

docker buildx create --name remote --driver remote --driver-opt cacert=/certs/ca.crt,cert=/certs/tls.crt,key=/certs/tls.key tcp://buildkitd-arm.buildkit:1234 --use
docker buildx create --name remote --driver remote --driver-opt cacert=/certs/ca.crt,cert=${PWD}/tls.crt,key=/certs/tls.key tcp://buildkitd-amd64.buildkit:1234 --append

This command creates an initial builder called remote (in this case the ARM64 service) and sets it as the default builder, then appends an additional on (the amd64 service). Internally, it records the capabilities of each builder — namely the platforms each supports.

If we now run the following command, Buildx will dispatch build jobs to the correct service for each architecture in parallel:

docker buildx build --platform=linux/amd64,linux/arm64 .

Results

In our final architecture for our Buildkit farm, we deployed two instances of Buildkit per architecture, each with its own EBS volume. So we do not have a distributed cache currently. There is not much information on whether Buildkit could cope with, say, a NFS file system, and this was not something we wanted to look into and potentially debug. For our use case though, most of our builds are going to be building mostly the same things most of the time. So even though we have multiple standalone caches, these will be populated very quickly and we should see a good cache hit rate. And indeed this has proven to be the case.

We’ve seen build times of comparable build tasks reduced by around 80%, and cache hit rates are generally very good. And as Docker image build steps are generally the most compute-intensive operations in our pipelines, we are now able to reduce the resources allocated to each ephemeral runner.

There are still many more optimisations to be made to our CI setup, but our experience with Buildkit has been excellent so far and we’re very happy with the outcome.

Have you heard? We’re hiring at VoucherCodes! Check out our careers page here.