[DockerCon 2023] Reproducible builds with BuildKit for software supply chain security

Akihiro Suda
nttlabs
Published in
5 min readOct 23, 2023

This is a recap of my talk “Reproducible builds with BuildKit for software supply chain security” at DockerCon (October 5th, 2023).

Slide 1

This was similar to my previous talk at FOSDEM in February, but the toolchain was simplified since then.

Background

Security assessment of third party Docker images has been a long challenge, due to the lack of verifiability in the software supply chain.

Images maintained by a reputable organization or an individual are often considered to be trustworthy, however, it is hard to deny a possibility that they might have silently injected malicious codes that are not present in the source repo. Also, even if they have no malicious intent, their images can be still compromised on an accidental leakage of registry credentials.

Reproducible builds reduce this concern. Reproducible builds is a technique to ensure that a bit-for-bit identical image can be reproduced from its source code, by anybody, at any time. When multiple actors can attest to an image’s reproducibility, it signifies that the image contains no code of a secret origin.

Slide 3

Are Docker Hub images actually reproducible?

Most of them are not. You can run docker build https://github.com/docker-library/... to rebuild an image on Docker Hub by yourself, and use my diffoci(diff for Open Container Initiative images) tool <https://github.com/reproducible-containers/diffoci> to see why they are not reproducible:

docker pull golang:1.21.1-alpine@sha256:96634e55b363cb93d39f78fb18aa64abc7f96d372c176660d7b8b6118939d97b

# DOCKER_BUILDKIT=0 with Docker 20.10.23 corresponds to the current Docker Hub image (Will change in the future)
export DOCKER_BUILDKIT=0
docker build -t my-golang "https://github.com/docker-library/golang.git#585c8c1e705a7a458455f0629922a4f90628ce08:1.21/alpine3.18”

go install github.com/reproducible-containers/diffoci/cmd/diffoci@latest

diffoci diff docker://golang:1.21.1-alpine docker://my-golang

The diffoci result for golang:1.21.1-alpine contains more than 14,000 lines of diffs, but most of them are just the differences of the timestamps:

$ diffoci diff docker://golang:1.21.1-alpine docker://my-golang
TYPE NAME INPUT-0 INPUT-1
Desc application/vnd.docker.distribution.manifest.v2+json b25862... 3c4eca0...
...
File etc/ssl/certs/3e45d192.0 2023-08-09 03:36:47 +0000 UTC 2023-09-21 08:35:31 +0000 UTC
...
(More than 14,000 lines)
...
File go/ 2023-09-06 18:31:40 +0000 UTC 2023-09-21 08:35:45 +0000 UTC

The --semantic flag can be used to ignore such “boring” differences:

$ diffoci --semantic diff docker://golang:1.21.1-alpine docker://my-golang
TYPE NAME INPUT-0 INPUT-1
Layer ctx:/layers-1/layer length mismatch (457 vs 454)
Layer ctx:/layers-1/layer name "usr/local/share/ca-certificates/.wh..wh..opq" only appears in input 0
Layer ctx:/layers-1/layer name "etc/ca-certificates/.wh..wh..opq" only appears in input 0
Layer ctx:/layers-1/layer name "usr/share/ca-certificates/.wh..wh..opq" only appears in input 0
File lib/apk/db/scripts.tar eef110e... e9bfe18...
Layer ctx:/layers-2/layer length mismatch (13939 vs 13938)
Layer ctx:/layers-2/layer name "usr/local/go/.wh..wh..opq" only appears in input 0
File lib/apk/db/scripts.tar 60e22bb... 67f2648...
Layer ctx:/layers-3/layer length mismatch (4 vs 3)
Layer ctx:/layers-3/layer name "go/.wh..wh..opq" only appears in input 0

The remaining differences are:

  • .wh..wh..opq (AUFS whiteouts) are missing in the local build due to the filesystem difference
  • lib/apk/db/scripts.tar differs due to the timestamp information inside itself (the --semantic flag isn’t still clever enough to ignore timestamps inside nested tar archives)

How to make images reproducible

Timestamps

Timestamps are one of the obvious challenges to achieve reproducibility. Docker/OCI (Open Container Initiative) images have timestamps in:

  1. the createdproperty in the OCI Image Config (shown in docker image ls )
  2. the historyproperty in the OCI Image Config (shown in docker image history )
  3. the org.opencontainers.image.created annotation in the OCI Image Index
  4. the timestamps of the files in the image layers

BuildKit v0.11 added the support for rewriting the timestamps for 1, 2, and 3 to reduce non-reproducibility.
This features was extended in BuildKit v0.13 (beta) to cover 4 as well.

# Configure buildx to use BuildKit v0.13 beta1
docker buildx create --use --driver-opt image=moby/buildkit:v0.13.0-beta1

# Rewrite the timestamps in the image to the timestamp of the latest git commit
docker buildx build --build-arg SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct) \
--output type=image,name=example.com/image,push=true,rewrite-timestamp=true

SOURCE_DATE_EPOCH (uint64; seconds from 1970–01–01 00:00:00 UTC) here is an environment variable standardized by <https://reproducible-builds.org/>. This environment variable is also recognized by gcc, clang, cmake, etc.to make application binaries reproducible too. See <https://reproducible-builds.org/docs/source-date-epoch/> for the details.

Pinning packages

The base image for Dockerfile can be pinned with tags like FROM debian:bookworm-20230904-slim . However, this is not enough for reproducing apt-get results, as apt-get installs the packages from the latest repos, not from the snapshot on 2023–09–04.

To install packages from a past snapshot, you have to configure the package manager to use a past snapshot explicitly. For Debian, /etc/apt/sources.list can be configured to use snapshot.debian.org/archive/debian/20230904T000000Z as follows:

FROM debian:bookworm-20230904-slim
ENV DEBIAN_FRONTEND=noninteractive
RUN rm -rf /etc/apt/sources.list* && \
echo 'deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20230904T000000Z bookworm main' \
>/etc/apt/sources.list && \
echo 'deb [check-valid-until=no] http://snapshot.debian.org/archive/debian-security/20230904T000000Z bookworm-security main' \
>>/etc/apt/sources.list && \
echo 'deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20230904T000000Z bookworm-updates main' \
>>/etc/apt/sources.list && \
apt-get update && \
apt-get install -y gcc

I wrote a script <https://github.com/reproducible-containers/repro-sources-list.sh> to simplify setting up /etc/apt/sources.list and enabling the cache for /var/cache/apt :

FROM debian:bookworm-20230904-slim
ADD --chmod=0755 \
https://raw.githubusercontevnt.com/reproducible-containers/repro-sources-list.sh/v0.1.0/repro-sources-list.sh \
/usr/local/bin/repro-sources-list.sh
ENV DEBIAN_FRONTEND=noninteractive
RUN --mount=type=cache,target=/var/cache/apt \
repro-sources-list.sh && \
apt-get update && \
apt-get install -y gcc

Caching /var/cache/apt is optional, but highly recommended, as the snapshot server isn’t as fast as regular apt-get servers. The cache for /var/cache/aptcan be saved on GitHub Actions using <https://github.com/reproducible-containers/buildkit-cache-dance> :

steps:
- uses: actions/cache@v3
with:
path: var-cache-apt
key: var-cache-apt-${{ hashFiles('Dockerfile') }}
- uses: reproducible-containers/buildkit-cache-dance@v2.1.2
with:
cache-source: var-cache-apt
cache-target: /var/cache/apt

The techniques above work for Ubuntu (snapshot.ubuntu.com) and ArchLinux ( archive.archlinux.org ) too.

However, this is still challenging for Alpine Linux, Rocky Linux, AlmaLinux, etc., as they do not have snapshot servers. A workaround for these distro is to preserve /etc/apk/cache , /var/cache/dnf ,etc. by yourself: <https://github.com/reproducible-containers/repro-pkg-cache>.
In the long term, BuildKit frontends may have a built-in feature to help this: <https://github.com/moby/buildkit/issues/4259>.

Future work

After the general availability of BuildKit v0.13, I’ll submit PRs to make well-known images reproducible.

We also need a “single-click” platform for attesting reproducibility and sharing the result. This will probably need help from registry service providers.

NTT is hiring!

We at NTT are looking for engineers who work in Open Source communities like Docker/Moby, BuildKit, and their relevant projects. Visit <https://www.rd.ntt/e/sic/recruit/> to see how to join us.

私たちNTTは、Docker/Moby や BuildKit などのオープンソースコミュニティで共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: <https://www.rd.ntt/sic/recruit/>

Links

Tools and examples: <https://github.com/reproducible-containers>

BuildKit docs: <https://github.com/moby/buildkit/blob/master/docs/build-repro.md>

--

--