How Docker BuildKit and GitLab Runner fill up storage in Kubernetes
Docker layers are like the ingredients of an hamburger.
You don’t want too make a mess of it and always look for a delightful balance of flavors.
Do you want to pull out an ingredient in the middle of your Hamburger?
You start removing layers from the top, you drop the peppers, and you re-mount the hamburger wasting the removed layers.
After this brief parenthesis on how to make a good hamburger, is time to cover how you may fill up the disk using docker and BuildKit builder on your Laptop or worker nodes in a Kubernetes Cluster.
I will explain how to identify and solve the problem.
There is also a temporary workaround, and I will dig into the docker source code to find out how the docker Garbage Collection frequency is working.
Disclaimer:
However, docker in docker comes with security issues and you got better tools in the box e.g. Kaniko.
Any clap, follow, or comment is highly appreciated!
Am I using docker BuildKit somewhere?
To verify if you are using docker with BuildKit enabled you must use:
- Docker ≥ 18.09
$ docker version
And at least one of the below options:
- You configure the environment variable
DOCKER_BUILDKIT=1
- You run
docker buildx build
to build your Dockerfile - You have BuildKit enabled by default in the daemon configuration
/etc/docker/daemon.json
{
"features": {
"buildkit" : true
}
}
- Run
docker system df
and look in the last rowBuild Cache
and check the Size/Reclaimable
❯ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 177 3 46.54GB 45.69GB (98%)
Containers 13 0 39.39MB 39.39MB (100%)
Local Volumes 0 0 0B 0B
Build Cache 660 0 177GB 130GB
Ref: https://docs.docker.com/build/buildkit/#getting-started
What is the difference between buildx
and BuildKit
?
BuildKit
is a kind of image-building enginebuildx
is the command to control the engine.
So BuildKit
is the engine, buildx
is the steering wheel ⎈.
How does Gitlab Runner interact with a Kubernetes Cluster?
In the diagram below, I’ve created a high-level diagram of how the Gitlab Runner using a Kubernetes Executor works. I simplify it on purpose because I’m not covering this part extensively and GitLab is full of documentation.
Ref: https://docs.gitlab.com/runner/#runner-execution-flow.
The standard steps are:
- The Kubernetes Executor is deployed in Kubernetes as a Deployment.
- The executor pod constantly asks with a specific polling period, if there are pending jobs or not.
- If there is a pending job, it schedules a new pod based on Toleration/NodeSelector/AffinityRules previously configured.
Let’s say that the Jobs are running docker build
and there is configured an environment variable DOCKER_BUILDKIT=1
. You can enable it in the Gitlab Runner by configuring an environment variable inside config.toml
as below:
config.toml
...
[[runners]]
name = "Kubernetes Builder"
executor = "kubernetes"
environment = ["DOCKER_DRIVER=overlay2","DOCKER_BUILDKIT=1"]
...
With this configuration, we are going to use the new BuildKit builder.
How to verify the disk usage of a Worker Node?
If you don’t configure the specific policy, the Docker Garbage Collection is taking into account the cleanup of the Build Cache.
To identify who is consuming the space you have to run a pod with this manifest here and then under /host
there will be mounted the worker node filesystem.
Based on how much disk is filled up you try the following commands:
apk add ncdu
→ ncdu
(if you have a lot of files/folder it doesn’t work)
df -hi
it shows the high-level filesystem usage.du -shc /host/var/lib/docker/overlay2/*/diff | tee disk_usage.log
docker buildx du | head -n 50
— be patient, it requires some time
docker buildx du
docker system df
— be patient, it requires some time
docker system df
In the output of docker system df
the row Build Cache
refers to the space used by BuildKit.
At this point, I started digging into the source code of BuildKit and Docker to verify how Build Cache
is evaluated.
In the source code of moby/docker this public function DiskUsage
uses controlapi.DiskUsageRequest{}
that is part BuildKit package.
BuildKit is filling up the disk
The disk is filling up because neither the kubelet
(aka the captain of the worker nodes) and dockerd
are cleaning the BuildKit cache.
In thekubelet
you can configure two flags to clean up the docker images based on a threshold:
--image-gc-high-threshold=60
--image-gc-low-threshold=50
But this is not going to clean up the BuildKit cache for you and by default, the BuildKit Garbage Collection is disabled.
For this reason, dockerd
is not going to clean up the cache automatically.
How to configure BuildKit Garbage Collection
The following enables the default GC on your docker daemon:
"builder": {
"gc": {
"enabled": true
}
}
You have to add this part in the /etc/docker/daemon.json
as below:
{
...
"builder": {
"gc": {
"enabled": true,
}
},
"features": {
"buildkit": true
}
}
Ref: https://docs.docker.com/build/building/cache/garbage-collection/
If you want to see more advanced policies I recommend taking a look on this pull request that will be released soon.
For an advanced policy example:
{
...
"builder": {
"gc": {
"enabled": true,
"policy": [
{"keepStorage": "20GB", "filter": ["unused-for=168h"]},
{"keepStorage": "50GB", "all": true}
]
}
}
The configuration above shows that the garbage collection is on, and it follows two rules.
- if the build cache is more than 20GB delete every unused build cache that is more than 7 days old (converted to days)
- if the first rule is not enough to bring the cache down to 20GB it jumps to the next rule, where it should remove all the build cache data until it the keep storage reaches 50GB.
For every state, once the condition is meant, it will terminate and not move to the other condition.
Ref: garbagecollection_config.md
Solve the problem in Kubernetes/Laptop/VM:
In your worker nodes, you need to setbuilder.gc.enabled = true
in the /etc/docker/daemon.json
as below:
{
"bridge": "xxx",
"log-driver": "xxx",
"log-opts": {
"xxxx" : "xxx"
},
"builder": {
"gc": {
"enabled": true
}
"live-restore": true
}
Unfortunately, this parameter requires a docker daemon restart because is not listed here:
So the options are:
- Restart the docker daemon, and be careful that
live-restore
is true.
This could be a disruptive solution — is not an option for me. - Create/Override the worker node image providing the new
/etc/docker/daemon.json
An EKS example here: https://aws.amazon.com/premiumsupport/knowledge-center/eks-worker-nodes-image-cache/
Given that the second option requires a bit of time, at the end of the article I provide a fast workaround to mitigate the problem.
Do you know which is the periodicity of the Garbage Collector in docker?
- 1 hour?
- 1 minute?
Let’s come up with the frequency
To find the frequency of the Garbage Collector I enable the dockerd
logs and I set defaultKeepStorage=1MB
(dockerd restart required to grab those):
{
"builder": {
"gc": {
"enabled": true,
"defaultKeepStorage": "1MB"
}
},
"features": {
"buildkit": true
},
"debug": true
}
We build any docker image with a Dockerfile to create a cache bigger than 1 MB.
$ docker buildx build .
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 177 3 46.54GB 45.69GB (98%)
Containers 13 0 39.39MB 39.39MB (100%)
Local Volumes 0 0 0B 0B
Build Cache 661 0 65.23MB 65.23MB
In this example, I have a Build Cache
of 65MB.
We open the dockerd
logs stored in a specific path https://docs.docker.com/config/daemon/#read-the-logs depending on the operating system and after a maximum 1 minute, we should see the log below:
time="2022–11–23T21:37:05.XXXX" level=debug msg="gc cleaned up 65220352 bytes"
Now we can build again, and wait for the log again to compute the difference:
time="2022-11-23T21:38:05.XXXX" level=debug msg="gc cleaned up 65220352 bytes
Comparing the two logs, the Garbage Collection is done every 1 minute.
But why 1 minute?
Below there is the source code of moby/docker that implement the controller that links the GC.
In row #75 there is a method throttle.After(time.Minute, c.gc)
This method returns a function with a Mutex and the actual sleep.
c.throttledGC
starts thanks to the Defer (#77) used to delay the execution of a function or a statement until the nearby function returns.
— Temporary workaround —
The workaround is to create a cronJob or a DaemonSet that runs the docker buildx prune
command in every node.
docker buildx prune --filter until=168h --verbose --force
The command above cleans the cache that is older than 7 days and gives in output every deleted layer.
To run the command above in the cluster, here http://bit.ly/3V43DVQ there is a helm chart that creates a DaemonSet.
Every pod that completes the docker buildx prune
command, stays in sleep forever to avoid the infinite restarts of the DaemonSet controller.
The idea is that you enable the DaemonSet in your CI — wait for 1h — you disable it. This temporary workaround allows me to delete 10 Terabytes of cache in 1 hour.
Any clap, follow, or comment is highly appreciated!