Docker build cache sharing on multi-hosts with BuildKit and buildx

Jiang Huan
Jul 10, 2019 · 7 min read

To share how BuildKit is used along with buildx to speed up our image build jobs on multiple hosts such as shared Gitlab CI runners.

At the time of writing, we are using the pre-release version of docker community edition 19.03, with BuildKit support and buildx release v0.2.2.

Image for post
Image for post
Photo by chuttersnap on Unsplash

Currently in Titansoft, we are on a journey of infrastructure transformation with Docker and Kubernetes. We believe adopting Kubernetes will benefit our customers with better time-to-market, reduced infrastructure up-front cost, and more efficient resource utilisation.

To support this infrastructure transformation across product teams, we have created standardised CI pipelines with Gitlab CI to build, test and deploy our containerized applications.

Image for post
Image for post

We have found the default builders to be difficult to use, and in this article, we will focus on how the BuildKit build cache export/import feature helps us in speeding up image build jobs across multiple hosts.

Why does the default build cache not work for us?

When using the docker build command to build a Docker image multiple times, we noticed that the subsequent runs are very fast. Almost abnormally so. The reason is that docker caches the layers created from the instructions we specified in the Dockerfile.

➜ docker build -t "hello:local" . 
Sending build context to Docker daemon 2.042MB
Step 1/15 : FROM microsoft/dotnet:2.2-sdk AS builder
Step 6/15 : RUN dotnet restore
---> Using cache
---> 15fceb67915a
Step 7/15 : COPY . .
---> Using cache
---> 4cd4a0db0fdd
Step 8/15 : RUN dotnet publish -o out
---> Using cache
---> 45d90ad676e6
Successfully built 52932ee37d22 Successfully tagged hello:local

The builder skips long-running command such as RUN dotnet restore and uses the cached intermediate layer directly if there are no changes from the previous layers.

However, there are a few things which prevents us from leveraging this cache in our setup.

  1. We have a scheduler to run docker system prune -af --volumes --filter="until=72h" to clean up jobs left-over on the runners regularly, including build caches. This is necessary for the limited amount of disk space on the shared runners.
  2. Docker build cache only works on the same host. We have a group of 10 runners shared across all projects — there is a low chance for build jobs to consecutively hit on the same host.
  3. Using docker build --cache-from argument seems like an option. However, it turned out that it is not working with multi-stage build, because intermediate layers in the builder stage are missing in the history of final images.

As a result, when no build cache is available, building an image for a normal sized application takes 6-10 minutes, as the slowest step is usually downloading the dependencies from package managers.

Image for post
Image for post
Takes 6 mins to build an image

This does not satisfy our product teams as we aim for code-to-production within 5 minutes for our applications, during urgent situations.

BuildKit to the rescue

What is BuildKit and how does it solve our problem? A brief introduction taken from this article simply explains that:

BuildKit is a new project under the Moby umbrella for building and packaging software using containers. It’s a new codebase meant to replace the internals of the current build features in the Moby Engine.

BuildKit provides overall improvements for the image building process with a set of improvements on performance, storage management, feature functionality, and security.

To leverage more on its build cache import/export feature, the docker-cli plugin docker/buildx helps in managing BuildKit daemons and provides an interface to BuildKit which is similar to the docker build command.

buildx: Docker CLI plugin for extended build capabilities with BuildKit

The following is an example setup of BuildKit and buildx on a Ubuntu 16.04 host:

Installing docker CE 19.03-rc

On a fresh copy of Ubuntu 16.04, install the docker community edition 19.03 from the pre-release channel (at the time of writing, 19.03 is the only release candidate).

In the future, you may check out the official installation guide when docker engine 19.03 is released in the stable channel.

sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common -ycurl -fsSL | sudo apt-key add -sudo add-apt-repository "deb [arch=amd64] $(lsb_release -cs) test"sudo apt-get updatesudo apt-get install docker-ce docker-ce-cli -y

Installation can be verified by the docker -v command. For example,

$ docker -v
Docker version 19.03.0-rc3, build 27fcb77

Installing buildx

Next, download and install buildx (at the time of writing, buildx has been released with version 0.2.2).

wget -q chmod 777 buildx-v0.2.2.linux-amd64mkdir -p ~/.docker/cli-plugins/mv buildx-v0.2.2.linux-amd64 ~/.docker/cli-plugins/docker-buildx

After installation, buildx uses the default builder of docker driver type.

$ docker buildx ls
default * docker
default default running linux/amd64

We need to create a builder of docker-container driver type to make use of registry build cache exporter.

docker buildx create --name mybuilder --use

Then list the builders again, and you will see something similar to this:

$ docker buildx ls
mybuilder * docker-container
mybuilder0 unix:///var/run/docker.sock running linux/amd64
default docker
default default running linux/amd64

Building some image

With all of these set up and ready to go, let us build some image! We have a template ASP.NET Core project named “HelloService”. This is what its Dockerfile looks like:

# Builder stage
FROM microsoft/dotnet:2.2-sdk AS builder

COPY *.sln nuget.config ./
COPY */*.csproj ./
RUN for file in $(ls *.csproj); do mkdir -p ${file%.*} && mv $file ${file%.*}/; done
RUN dotnet restore
COPY . .
RUN dotnet publish -o out

# Runtime stage
FROM microsoft/dotnet:2.2-aspnetcore-runtime AS runtime
COPY --from=builder /app/HelloService/out .
ENTRYPOINT [ "dotnet", "HelloService.dll" ]

It is a multi-stage build with a builder stage and a runtime stage. The build job scripts in CI pipeline are similar to this:

-f ./Dockerfile \
--cache-from=type=registry,ref=$CACHE_TAG \
--cache-to=type=registry,ref=$CACHE_TAG,mode=max \
--push \
--progress=plain \

With this docker buildx build command, the builder will:

  1. Try to retrieve a build cache from the registry by CACHE_TAG
  2. Build the image and push the final image to the registry by IMAGE_TAG
  3. Push the build cache to the registry by CACHE_TAG
$ docker buildx build -t $IMAGE_TAG -f ./Dockerfile --cache-from=type=registry,ref=$CACHE_TAG --cache-to=type=registry,ref=$CACHE_TAG,mode=max --push --progress=plain .#1 [internal] booting buildkit
#1 DONE 47.8s
#3 importing cache manifest from <CACHE_TAG>
#3 ERROR: <CACHE_TAG> not found
#10 [builder 1/5] FROM
#7 [runtime 1/2] FROM
#15 [builder 5/5] RUN dotnet publish -o out
#16 [runtime 2/2] COPY --from=builder /app/HelloService/out .
#17 exporting to image
#17 DONE 20.4s
#18 exporting cache
#18 DONE 91.1s

We have built and pushed an image with BuildKit, and a build cache has been saved to the container registry! Great!

As a side note, we use DockerHub as the cache repository instead of Google Container Registry which stores our application images. This is because at the time of writing, Google Container Registry does not seem to support the cache manifest format application/vnd.buildkit.cacheconfig.v0 and returns Bad Request 400 when trying to push a build cache. So we fell back on using a private repo on DockerHub for now and it works perfectly.

Final Results

We did some experiments and tried building our HelloService several times, in different cases when the application has dependency change, code change, or no change at all.

  • Dependency change means updates in NuGet package references are such that the RUN dotnet restore layer cache is invalidated and needs a full rebuild of almost all the layers.
  • Code change means any change in the build context except for dependency change. In such a case, the RUN dotnet restore step is skipped but there is a need to execute RUN dotnet publish to build the application artifacts.
  • No change means there is nothing changed in the build context. All layers are cached.

We made each job land on a new runner to make sure no local build cache is available.

This is a comparison of time consumption:

Image for post
Image for post

That is a great result! Because for our use case, most commits (90%) by the team are only code changes, so we can save half of their build time!

Bear in mind that our solution uses a combination of experimental features, pre-release-versioned software, and tools in tech preview.

So, use at your own risk! 😛

Titansoft Engineering Blog

Stories from our engineering teams.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store