Docker build cache sharing on multi-hosts with BuildKit and buildx

Jiang Huan
Jul 10 · 7 min read

To share how BuildKit is used along with buildx to speed up our image build jobs on multiple hosts such as shared Gitlab CI runners.

At the time of writing, we are using the pre-release version of docker community edition 19.03, with BuildKit support and buildx release v0.2.2.

Photo by chuttersnap on Unsplash

Currently in Titansoft, we are on a journey of infrastructure transformation with Docker and Kubernetes. We believe adopting Kubernetes will benefit our customers with better time-to-market, reduced infrastructure up-front cost, and more efficient resource utilisation.

To support this infrastructure transformation across product teams, we have created standardised CI pipelines with Gitlab CI to build, test and deploy our containerized applications.

In this article, we want to focus on, how the BuildKit build cache export/import feature helps us in speeding up image build jobs across multiple hosts, as we find the build cache of the default builders difficult to use.

Why does the default build cache not work for us?

➜ docker build -t "hello:local" . 
Sending build context to Docker daemon 2.042MB
Step 1/15 : FROM microsoft/dotnet:2.2-sdk AS builder
[...]
Step 6/15 : RUN dotnet restore
---> Using cache
---> 15fceb67915a
Step 7/15 : COPY . .
---> Using cache
---> 4cd4a0db0fdd
Step 8/15 : RUN dotnet publish -o out
---> Using cache
---> 45d90ad676e6
[...]
Successfully built 52932ee37d22 Successfully tagged hello:local

The builder skips long-running command such as RUN dotnet restore and uses the cached intermediate layer directly if there are no changes from the previous layers.

However, there are a few things which prevents us from leveraging this cache in our setup.

  1. We have a scheduler to run docker system prune -af --volumes --filter="until=72h" to clean up jobs left-over on the runners regularly, including build caches. This is necessary for the limited amount of disk space on the shared runners.
  2. Docker build cache only works on the same host. We have a group of 10 runners shared across all projects — there is a low chance for build jobs to consecutively hit on the same host.
  3. Using docker build --cache-from argument seems like an option. However, it turned out that it is not working with multi-stage build, because intermediate layers in the builder stage are missing in the history of final images.

As a result, where no build cache is available, building an image for a normal sized application takes 6-10 minutes, as the slowest step is usually downloading the dependencies from package managers.

Takes 6 mins to build an image

This does not satisfy our product teams as, during urgent situations, we aim for code-to-production within 5 minutes for our applications.

BuildKit to the rescue

BuildKit is a new project under the Moby umbrella for building and packaging software using containers. It’s a new codebase meant to replace the internals of the current build features in the Moby Engine.

BuildKit provides overall improvements for the image building process with a set of improvements on performance, storage management, feature functionality, and security.

To leverage more of its build cache import/export feature, the docker-cli plugin docker/buildx helps in managing BuildKit daemons and provides with an interface to BuildKit which is similar to the docker build command.

buildx: Docker CLI plugin for extended build capabilities with BuildKit

The following is an example setup of BuildKit and buildx on a Ubuntu 16.04 host:

Installing docker CE 19.03-rc

In the future, you may check out the official installation guide when docker engine 19.03 is released in the stable channel.

sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common -ycurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) test"sudo apt-get updatesudo apt-get install docker-ce docker-ce-cli containerd.io -y

Installation can be verified by docker -v command. For example,

$ docker -v
Docker version 19.03.0-rc3, build 27fcb77

Installing buildx

wget -q https://github.com/docker/buildx/releases/download/v0.2.2/buildx-v0.2.2.linux-amd64sudo chmod 777 buildx-v0.2.2.linux-amd64mkdir -p ~/.docker/cli-plugins/mv buildx-v0.2.2.linux-amd64 ~/.docker/cli-plugins/docker-buildx

After installation, buildx uses the default builder of docker driver type.

$ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS PLATFORMS
default * docker
default default running linux/amd64

We need to create a builder of docker-container driver type to make use of registry build cache exporter.

docker buildx create --name mybuilder --use

Then list the builders again, and you will see something similar to this:

$ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS PLATFORMS
mybuilder * docker-container
mybuilder0 unix:///var/run/docker.sock running linux/amd64
default docker
default default running linux/amd64

Building some image

# Builder stage
FROM microsoft/dotnet:2.2-sdk AS builder

WORKDIR /app
COPY *.sln nuget.config ./
COPY */*.csproj ./
RUN for file in $(ls *.csproj); do mkdir -p ${file%.*} && mv $file ${file%.*}/; done
RUN dotnet restore
COPY . .
RUN dotnet publish -o out

# Runtime stage
FROM microsoft/dotnet:2.2-aspnetcore-runtime AS runtime
WORKDIR /app
COPY --from=builder /app/HelloService/out .
EXPOSE 5000
ENTRYPOINT [ "dotnet", "HelloService.dll" ]

It is a multi-stage build with a builder stage and a runtime stage. The build job scripts in CI pipeline are similar to this:

IMAGE_TAG=<IMAGE_REPO>:<CI_COMMIT_HASH>CACHE_TAG=<CACHE_REPO>:<CI_PROJECT_ID>-<CI_BRANCH_NAME>docker buildx build \
-t $IMAGE_TAG \
-f ./Dockerfile \
--cache-from=type=registry,ref=$CACHE_TAG \
--cache-to=type=registry,ref=$CACHE_TAG,mode=max \
--push \
--progress=plain \
.

With this docker buildx build command, the builder will:

  1. Try to retrieve a build cache from the registry by CACHE_TAG
  2. Build the image and push the final image to the registry by IMAGE_TAG
  3. Push the build cache to the registry by CACHE_TAG
$ docker buildx build -t $IMAGE_TAG -f ./Dockerfile --cache-from=type=registry,ref=$CACHE_TAG --cache-to=type=registry,ref=$CACHE_TAG,mode=max --push --progress=plain .#1 [internal] booting buildkit
[...]
#1 DONE 47.8s
#3 importing cache manifest from <CACHE_TAG>
#3 ERROR: <CACHE_TAG> not found
[...]
#10 [builder 1/5] FROM docker.io/microsoft/dotnet:2.2-sdk@sha256:06dd42427ad...
[...]
#7 [runtime 1/2] FROM docker.io/microsoft/dotnet:2.2-aspnetcore-runtime@sha...
[...]
#15 [builder 5/5] RUN dotnet publish -o out
[...]
#16 [runtime 2/2] COPY --from=builder /app/HelloService/out .
[...]
#17 exporting to image
[...]
#17 DONE 20.4s
#18 exporting cache
[...]
#18 DONE 91.1s

Great! We have built and pushed an image with BuildKit, and a build cache has been saved to the container registry!

As a side note, we use DockerHub as the cache repository instead of Google Container Registry which stores our application images. This is because at the time of writing, Google Container Registry does not seem to support the cache manifest format application/vnd.buildkit.cacheconfig.v0 and returns Bad Request 400 when trying to push a build cache. So we fell back to using a private repo on DockerHub for now and it works perfectly.

Final Results

  • Dependency change means updates in NuGet package references so the RUN dotnet restore layer cache is invalidated and needs a full rebuild of almost all the layers.
  • Code change means any change in the build context except for dependency change. In such a case, the RUN dotnet restore step is skipped but needs to execute RUN dotnet publish to build the application artifacts.
  • No change means there is nothing changed in the build context. All layers are cached.

We made each job land on a new runner to make sure no local build cache is available.

This is a comparison of time consumption:

That is a great result! Because for our use case, most commits (90%) by the team are only code changes, so we can save half of their build time!


Bear in mind that our solution is a combination of experimental features, pre-release-versioned software, and tools in tech preview.

So, use at your own risk 😛

Titansoft Engineering Blog

Stories from our engineering teams.

 by the author.

Jiang Huan

Written by

Software Engineer. Backend and Infrastructure. https://billjh.github.io/

Titansoft Engineering Blog

Stories from our engineering teams.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade