Speeding up CI in Kubernetes with Docker and Buildkit

Matt Potter
VoucherCodes Tech Blog
7 min readOct 30, 2023
Photo by NordWood Themes on Unsplash

On the VoucherCodes Platform Team we’ve recently been looking at how we can improve developer experience and all-round efficiency. One of the most frequent points of feedback was that CI (Continuous Integration) can be frustratingly slow, especially around Docker image builds.

How do we run CI at VoucherCodes?

Our CI infrastructure is fairly standard for a predominantly Kubernetes-based environment. We use Gitlab for our code repositories and CI, and we host our runners on Kubernetes.

The jobs themselves run as Kubernetes pods - with Docker-in-Docker running as a Gitlab CI service/sidecar - and we point the Docker CLI in our jobs to reference the sidecar container.

To reference Gitlab’s documentation, a pipeline definition would look something like this:

default:
image: docker:24.0.5
services:
- docker:24.0.5-dind
before_script:
- docker info

variables:
DOCKER_HOST: tcp://docker:2375
DOCKER_TLS_CERTDIR: ""

build:
stage: build
script:
- docker build -t my-docker-image .
- docker run my-docker-image /script/to/run/tests

With the above setup in place, Docker commands are sent to a local Docker instance. This means we don’t need to do the somewhat unsafe things we had to do many years ago - like mounting a Docker socket file from the underlying host, for example.

However, there are disadvantages to this approach. The Docker instance only exists for the duration of the CI job, so the default local build cache is completely empty. With no other configuration, every job would have to build images from scratch and depending on your application, language and other dependencies, this can be extremely slow.

There are ways to improve this, such as using an inline build cache where the cache is embedded in the resulting image for reuse by subsequent jobs, or a registry cache where you push your cache to a separate location. You can also pre-pull images to use as a cache. However, with these options, your cache is still effectively remote, which can lead to slow cache lookups and retrieval, as well as increased network costs.

We also run multi-architecture builds. To complete these quickly (i.e., without QEMU emulation), we execute these jobs on Kubernetes nodes matching the respective architecture. This incurs additional costs in both compute and general administrative overhead.

Quick wins for maximum benefit

We investigated various approaches to improve our build times and cache hit rates. However, due to the intricacies of our pipelines and the dependencies within, we were unable to make significant improvements by optimising the remote caching strategies we had available to us.

We are a small team here at VoucherCodes and it was simply not reasonable to undertake a full rewrite of our pipelines, so we needed to look for solutions that would give us the right balance of effort and benefit.

Investigating Buildkit

Buildkit is, as the name suggests, the component within the Docker engine that conducts the build itself. It manages the build cache, which as I mentioned previously will be completely empty in our ephemeral CI setup. However, it is possible to run a remote Buildkit instance and point your Docker CLI to use that, rather than the local builder.

With this in mind, we had the idea that we could run stateful, permanent instances of Buildkit within our Kubernetes cluster, and simply point our ephemeral Gitlab CI jobs to use these. The local build cache would not be lost on every job, and if sized correctly we could gain compute efficiencies by running fewer (but larger) Buildkit instances, allowing the actual Gitlab CI jobs to execute with fewer resources.

Deploying a Buildkit Farm

Deploying Buildkit to Kubernetes is not a particularly common use case, so documentation is somewhat sparse. However, it turned out to be quite a simple task and the Buildkit repository has fairly clear examples. Using those as a starter, we made only minor changes and additions to deploy our Buildkit farm.

First of all, we needed to create certificates for authentication between the Docker Buildx clients and Buildkit. We already use Cert Manager throughout our Kubernetes clusters, so this was a simple task to create certificates using a self-signed issuer (unimportant fields removed for brevity):

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: buildkitd-tls
...
spec:
secretName: buildkitd-tls
...
dnsNames:
- buildkitd-arm.buildkit
- buildkitd-arm.buildkit.svc
- buildkitd-arm.buildkit.svc.cluster.local
- buildkitd-arm.buildkit.svc.cluster.local
- buildkitd-amd64.buildkit
- buildkitd-amd64.buildkit.svc
- buildkitd-amd64.buildkit.svc.cluster.local
- buildkitd-amd64.buildkit.svc.cluster.local
...

We then create a config to mount on the Buildkit instances

apiVersion: v1
kind: ConfigMap
metadata:
name: buildkitd-config
namespace: buildkit
data:
buildkitd.toml: |
debug = true

# config for build history API that stores information about completed build commands
[history]
# maxAge is the maximum age of history entries to keep, in seconds.
maxAge = 345600
# maxEntries is the maximum number of history entries to keep.
maxEntries = 100

[worker.oci]
gc = true
# gckeepstorage sets storage limit for default gc profile, in MB.
gckeepstorage = 200000
# maintain a pool of reusable CNI network namespaces to amortize the overhead
# of allocating and releasing the namespaces
cniPoolSize = 16

[[worker.oci.gcpolicy]]
keepBytes = "200GB"
keepDuration = "96h"
filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout"]
[[worker.oci.gcpolicy]]
all = true
keepBytes = "200GB"

[worker.containerd]
gc = true
# gckeepstorage sets storage limit for default gc profile, in MB.
gckeepstorage = 200000
# maintain a pool of reusable CNI network namespaces to amortize the overhead
# of allocating and releasing the namespaces
cniPoolSize = 16

[[worker.containerd.gcpolicy]]
keepBytes = "200GB"
keepDuration = "96h"
filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout"]
[[worker.containerd.gcpolicy]]
all = true
keepBytes = "200GB"

Configuration is one of the areas where documentation is lacking, so we had to make some assumptions with regard to some of the meaning and expected behaviour.

We were not too sure on the ideal sizing, but storage is cheap and our build pipelines contain a lot of dependencies, so we err on the side of caution and allocate 200GB to the cache before GC kicks in, with a retention window of 96 hours.

Deploying the actual instances, we created two stateful sets — one for amd64 and the other for arm, and we select appropriate nodes with a node selector (only one config shown):

apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: buildkitd-arm
name: buildkitd-arm
spec:
serviceName: buildkitd-arm
podManagementPolicy: Parallel
replicas: 2
selector:
matchLabels:
app: buildkitd-arm
template:
metadata:
labels:
app: buildkitd-arm
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
spec:
tolerations:
- key: buildkit
operator: Exists
effect: NoSchedule
nodeSelector:
buildkit: exclusive
beta.kubernetes.io/arch: arm64
containers:
- name: buildkitd-arm
image: moby/buildkit:master
args:
- --addr
- unix:///run/buildkit/buildkitd.sock
- --addr
- tcp://0.0.0.0:1234
- --tlscacert
- /certs/ca.crt
- --tlscert
- /certs/tls.crt
- --tlskey
- /certs/tls.key
readinessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
livenessProbe:
exec:
command:
- buildctl
- debug
- workers
initialDelaySeconds: 5
periodSeconds: 30
securityContext:
privileged: true
volumeMounts:
- name: config
readOnly: true
mountPath: /etc/buildkit/buildkitd.toml
subPath: buildkitd.toml
- name: certs
readOnly: true
mountPath: /certs
- name: cache
mountPath: /var/lib/buildkit
resources:
requests:
cpu: 3
memory: 8Gi
limits:
cpu: 8
memory: 30Gi
volumes:
- name: config
configMap:
name: buildkitd-config
items:
- key: buildkitd.toml
path: buildkitd.toml
- name: certs
secret:
secretName: buildkitd-tls
volumeClaimTemplates:
- metadata:
name: cache
spec:
accessModes:
- ReadWriteOnce
storageClassName: "ebs-sc"
resources:
requests:
storage: 250Gi

We then just need to expose them with an internal service (again, only one config shown)

apiVersion: v1
kind: Service
metadata:
labels:
app: buildkitd-arm
name: buildkitd-arm
namespace: buildkit
spec:
ports:
- port: 1234
protocol: TCP
selector:
app: buildkitd-arm

Using the new Buildkit farm

In order to use the new Buildkit farm, before using any docker build commands, you would simply need to register the remote builder and enable it for use:

docker buildx create --name remote --driver remote --driver-opt cacert=/certs/ca.crt,cert=/certs/tls.crt,key=/certs/tls.key tcp://buildkitd-arm.buildkit:1234 --use
docker buildx create --name remote --driver remote --driver-opt cacert=/certs/ca.crt,cert=${PWD}/tls.crt,key=/certs/tls.key tcp://buildkitd-amd64.buildkit:1234 --append

This command creates an initial builder called remote (in this case the ARM64 service) and sets it as the default builder, then appends an additional on (the amd64 service). Internally, it records the capabilities of each builder — namely the platforms each supports.

If we now run the following command, Buildx will dispatch build jobs to the correct service for each architecture in parallel:

docker buildx build --platform=linux/amd64,linux/arm64 .

Results

In our final architecture for our Buildkit farm, we deployed two instances of Buildkit per architecture, each with its own EBS volume. So we do not have a distributed cache currently. There is not much information on whether Buildkit could cope with, say, a NFS file system, and this was not something we wanted to look into and potentially debug. For our use case though, most of our builds are going to be building mostly the same things most of the time. So even though we have multiple standalone caches, these will be populated very quickly and we should see a good cache hit rate. And indeed this has proven to be the case.

We’ve seen build times of comparable build tasks reduced by around 80%, and cache hit rates are generally very good. And as Docker image build steps are generally the most compute-intensive operations in our pipelines, we are now able to reduce the resources allocated to each ephemeral runner.

There are still many more optimisations to be made to our CI setup, but our experience with Buildkit has been excellent so far and we’re very happy with the outcome.

Have you heard? We’re hiring at VoucherCodes! Check out our careers page here.

--

--