How Docker BuildKit and GitLab Runner fill up storage in Kubernetes

Published in

Geek Culture

8 min readNov 30, 2022

Photo by amirali mirhashemian on Unsplash

Docker layers are like the ingredients of an hamburger.
You don’t want too make a mess of it and always look for a delightful balance of flavors.
Do you want to pull out an ingredient in the middle of your Hamburger?
You start removing layers from the top, you drop the peppers, and you re-mount the hamburger wasting the removed layers.

After this brief parenthesis on how to make a good hamburger, is time to cover how you may fill up the disk using docker and BuildKit builder on your Laptop or worker nodes in a Kubernetes Cluster.

I will explain how to identify and solve the problem.
There is also a temporary workaround, and I will dig into the docker source code to find out how the docker Garbage Collection frequency is working.

Worker node disk usage constantly growing

Disclaimer:
However, docker in docker comes with security issues and you got better tools in the box e.g. Kaniko.

Any clap, follow, or comment is highly appreciated!

Am I using docker BuildKit somewhere?

To verify if you are using docker with BuildKit enabled you must use:

Docker ≥ 18.09
$ docker version

And at least one of the below options:

You configure the environment variable DOCKER_BUILDKIT=1
You rundocker buildx build to build your Dockerfile
You have BuildKit enabled by default in the daemon configuration /etc/docker/daemon.json

{
  "features": {
    "buildkit" : true
  }
}

Run docker system df and look in the last row Build Cache and check the Size/Reclaimable

❯ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          177       3         46.54GB   45.69GB (98%)
Containers      13        0         39.39MB   39.39MB (100%)
Local Volumes   0         0         0B        0B
Build Cache     660       0         177GB     130GB

Ref: https://docs.docker.com/build/buildkit/#getting-started

What is the difference between buildx and BuildKit?

BuildKit is a kind of image-building engine
buildx is the command to control the engine.

So BuildKit is the engine, buildx is the steering wheel ⎈.

How does Gitlab Runner interact with a Kubernetes Cluster?

In the diagram below, I’ve created a high-level diagram of how the Gitlab Runner using a Kubernetes Executor works. I simplify it on purpose because I’m not covering this part extensively and GitLab is full of documentation.
Ref: https://docs.gitlab.com/runner/#runner-execution-flow.

The standard steps are:

The Kubernetes Executor is deployed in Kubernetes as a Deployment.
The executor pod constantly asks with a specific polling period, if there are pending jobs or not.
If there is a pending job, it schedules a new pod based on Toleration/NodeSelector/AffinityRules previously configured.

Let’s say that the Jobs are running docker build and there is configured an environment variable DOCKER_BUILDKIT=1. You can enable it in the Gitlab Runner by configuring an environment variable inside config.toml as below:

config.toml
...
     [[runners]]
      name = "Kubernetes Builder"
      executor = "kubernetes"
      environment = ["DOCKER_DRIVER=overlay2","DOCKER_BUILDKIT=1"]
...

With this configuration, we are going to use the new BuildKit builder.

How to verify the disk usage of a Worker Node?

If you don’t configure the specific policy, the Docker Garbage Collection is taking into account the cleanup of the Build Cache.

To identify who is consuming the space you have to run a pod with this manifest here and then under /host there will be mounted the worker node filesystem.
Based on how much disk is filled up you try the following commands:

apk add ncdu→ ncdu (if you have a lot of files/folder it doesn’t work)

df -hi it shows the high-level filesystem usage.
du -shc /host/var/lib/docker/overlay2/*/diff | tee disk_usage.log

docker buildx du | head -n 50 — be patient, it requires some time

docker system df — be patient, it requires some time

In the output of docker system df the row Build Cache refers to the space used by BuildKit.

At this point, I started digging into the source code of BuildKit and Docker to verify how Build Cache is evaluated.

In the source code of moby/docker this public function DiskUsage uses controlapi.DiskUsageRequest{} that is part BuildKit package.

BuildKit is filling up the disk

The disk is filling up because neither the kubelet (aka the captain of the worker nodes) and dockerdare cleaning the BuildKit cache.

In thekubelet you can configure two flags to clean up the docker images based on a threshold:

--image-gc-high-threshold=60
--image-gc-low-threshold=50

But this is not going to clean up the BuildKit cache for you and by default, the BuildKit Garbage Collection is disabled.
For this reason, dockerd is not going to clean up the cache automatically.

How to configure BuildKit Garbage Collection

The following enables the default GC on your docker daemon:

"builder": {
    "gc": {
      "enabled": true
    }
  }

You have to add this part in the /etc/docker/daemon.json as below:

{
  ...
  "builder": {
    "gc": {
      "enabled": true,
    }
  },
  "features": {
    "buildkit": true
  }
}

Ref: https://docs.docker.com/build/building/cache/garbage-collection/

If you want to see more advanced policies I recommend taking a look on this pull request that will be released soon.

For an advanced policy example:

{
  ...
  "builder": {
    "gc": {
          "enabled": true,
          "policy": [
                {"keepStorage": "20GB", "filter": ["unused-for=168h"]},
                {"keepStorage": "50GB", "all": true}
            ]
        }
}

The configuration above shows that the garbage collection is on, and it follows two rules.

if the build cache is more than 20GB delete every unused build cache that is more than 7 days old (converted to days)
if the first rule is not enough to bring the cache down to 20GB it jumps to the next rule, where it should remove all the build cache data until it the keep storage reaches 50GB.

For every state, once the condition is meant, it will terminate and not move to the other condition.

Ref: garbagecollection_config.md

Solve the problem in Kubernetes/Laptop/VM:

In your worker nodes, you need to setbuilder.gc.enabled = truein the /etc/docker/daemon.json as below:

{
  "bridge": "xxx",
  "log-driver": "xxx",
  "log-opts": {
    "xxxx" : "xxx"
  },
  "builder": {
    "gc": {
          "enabled": true
  }
  "live-restore": true
}

Unfortunately, this parameter requires a docker daemon restart because is not listed here:

*https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-configuration-file*

So the options are:

Restart the docker daemon, and be careful that live-restore is true.
This could be a disruptive solution — is not an option for me.
Create/Override the worker node image providing the new /etc/docker/daemon.json
An EKS example here: https://aws.amazon.com/premiumsupport/knowledge-center/eks-worker-nodes-image-cache/

Given that the second option requires a bit of time, at the end of the article I provide a fast workaround to mitigate the problem.

Do you know which is the periodicity of the Garbage Collector in docker?

1 hour?
1 minute?

Let’s come up with the frequency

To find the frequency of the Garbage Collector I enable the dockerd logs and I set defaultKeepStorage=1MB (dockerd restart required to grab those):

{
  "builder": {
    "gc": {
      "enabled": true,
      "defaultKeepStorage": "1MB"
    }
  },
  "features": {
    "buildkit": true
  },
  "debug": true
}

We build any docker image with a Dockerfile to create a cache bigger than 1 MB.

$ docker buildx build .
$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          177       3         46.54GB   45.69GB (98%)
Containers      13        0         39.39MB   39.39MB (100%)
Local Volumes   0         0         0B        0B
Build Cache     661       0         65.23MB   65.23MB

In this example, I have a Build Cache of 65MB.

We open the dockerd logs stored in a specific path https://docs.docker.com/config/daemon/#read-the-logs depending on the operating system and after a maximum 1 minute, we should see the log below:

time="2022–11–23T21:37:05.XXXX" level=debug msg="gc cleaned up 65220352 bytes"

Now we can build again, and wait for the log again to compute the difference:

time="2022-11-23T21:38:05.XXXX" level=debug msg="gc cleaned up 65220352 bytes

Comparing the two logs, the Garbage Collection is done every 1 minute.

But why 1 minute?

Below there is the source code of moby/docker that implement the controller that links the GC.
In row #75 there is a method throttle.After(time.Minute, c.gc)

This method returns a function with a Mutex and the actual sleep.

c.throttledGC starts thanks to the Defer (#77) used to delay the execution of a function or a statement until the nearby function returns.

— Temporary workaround —

The workaround is to create a cronJob or a DaemonSet that runs the docker buildx prunecommand in every node.

docker buildx prune --filter until=168h --verbose --force

The command above cleans the cache that is older than 7 days and gives in output every deleted layer.

To run the command above in the cluster, here http://bit.ly/3V43DVQ there is a helm chart that creates a DaemonSet.
Every pod that completes the docker buildx prune command, stays in sleep forever to avoid the infinite restarts of the DaemonSet controller.

The idea is that you enable the DaemonSet in your CI — wait for 1h — you disable it. This temporary workaround allows me to delete 10 Terabytes of cache in 1 hour.