Faster Docker builds on K8s with Makisu and Redis

Published in

inganalytics.com/inganalytics

7 min readSep 23, 2020

Recently we have been playing during our experiment week with tools that could improve our CD/CI pipelines. At the moment we run some pipelines that build multiple docker images per day. Some images are quite large and exceed for sure the size suggested by best practices.

We use GitLab pipelines with runners that build docker images and push them to the registry.

When building multiple docker images per day one obvious thing that comes to the equation is the speed of running the pipeline, but also security is in scope.

Initially, we tried to improve those builds with the switch from using standard Docker build to Kaniko, but we found that the build process was slower and was lacking a caching option for built images.

Makisu

Another already available tool that we tried is Makisu. It is an open-source tool released by Uber. This is their response to the scalability and security issues they faced.

A good introduction is given in Uber blog post here.

To really quickly summarize Makisu can:

generate layers without elevated privileges. Makisu performs file system scans and keeps an in-memory representation of the file system. In comparison to Docker which generates layers using copy-on-write file system (CoW), which requires privileges
it uses high compression speed, in comparison to Docker that uses Go’s default gzip library to compress layers
quite importantly Makisu can use distributed cache — Redis to store the the “produced” layers
user can control what layers to create with COMMIT directives, those layers are stored in cache

If it’s already available why not test it and reuse it if works well?

For this test I’ve created repository with multi stage docker file definition and some Scala Hello World application.

Environment

Makisu can run and build Docker images locally, but I am more interested in running Makisu build using K8s with Redis cache.

My environment has K8s already installed. I will also run Redis in default Kubernetes namespace. Makisu build will point (via job definition) to this Redis instance to cache between the builds.
The source repository with Docker file definition is hosted on GitHub and once build succeeds it will push the images to Docker Hub.
One thing I will skip is the definition of an automated pipeline. Instead, I will just use kubectl client to submit the build job.

If you do not have K8s cluster it should also be possible to do test using Minikube.

Makisu Github repository contains all the files required to define and run build on K8s. They just need to be slightly updated for this test.

Prepare Redis and Repository Secret

Assuming that K8s cluster is already running and a source Dockerfile is in some repository, the below steps are required.

Run Redis (if not already running). I just customized the definition that can be found in Makisu repository by pointing to the correct Redis docker image and added development environment settings. The service definition is the same

apiVersion: v1
kind: Pod
metadata:
  name: redis
  labels:
    redis: "true"
spec:
  containers:
  - name: main
    image: bitnami/redis:latest
    env:
    - name: MASTER
      value: "true"
    - name: ALLOW_EMPTY_PASSWORD
      value: "yes"
    ports:
    - containerPort: 6379

Next, I run kubectl to create Redis deployment

$ kubectl create -f redis.yaml

And once done, in the next step, a secret for the docker image repository needs to be created. This secret will be used by the job later to login to Docker Hub to push created images. More examples can be found in the configuration readme file

index.docker.io:
  SOME_REPO:
    security:
      tls:
        client:
          disabled: false
      basic:
        username: "your_user"
        password: "password"

Having defined the above secret the following command is used to create it

$ kubectl create secret generic docker-registry-config — from-file=./registry.yaml
secret/docker-registry-config created

kubectl can be additionally used to check if secret has been created

$ kubectl get secret
NAME TYPE DATA AGE
docker-registry-config Opaque 1 83m

Let’s run some builds

Finally, I can define the job template to start the Docker build process. I modified the template provided in Makisu repository. The updated job definition is located here.

It does two main tasks:

Init container will clone source repository containing docker definition to Kubernetes POD shared volume
The second step defines Makisu build container and it’s arguments. Here I updated the push argument, defined Docker image tag, and added a flag which will point to Redis

args:
        - build
        - --push=index.docker.io
        - --modifyfs=true
        - -t=arempter/makisu-example:0.1
        - --registry-config=/registry-config/registry.yaml
        - --redis-cache-addr=redis:6379  
        - /makisu-context

Once a job is defined, it can be submitted to K8s cluster using kubectl client

$ kubectl create -f build_scala_app_job.yaml

Build job logs

Ok, build job has been defined and created in K8s cluster. So what can be found in job logs?

The first run should show the full build process…

Indeed build process notifies that it’s using Redis cache and since it is the first build attempt, logs show “cache get” fail messages

{“level”:”info”,”ts”:1600695144.1617439,”msg”:”Using redis at redis:6379 for cacheID storage”}
Failed to fetch intermediate layer with cache ID 88f43ac8: find layer 88f43ac8: layer not found in cache”

once the job successfully runs, log lines show cache key/value entries are added

{“level”:”info”,”ts”:1600695210.532459,”msg”:”Stored cacheID mapping to KVStore: 88f43ac8 => 862b63e5afa3588581eb4831a6e3fc59bf6659db2d948361e922a302e6bb1bee,9edf87ec4f3a419ce86dc9d219231fd2a1e6ed8ad958a4076fea38695b87fdd4"}
{“level”:”info”,”ts”:1600695210.5330272,”msg”:”Computed total image size 68116735",”total_image_size”:68116735}
{“level”:”info”,”ts”:1600695210.5330527,”msg”:”Successfully built image arempter/makisu-example:0.1"}

and in the last stage job pushes successfully built image to Docker Hub

{“level”:”info”,”ts”:1600695219.7675886,”msg”:”* Pushed image index.docker.io/arempter/makisu-example:0.1",”duration”:9.234456742}
{“level”:”info”,”ts”:1600695219.7676253,”msg”:”Successfully pushed index.docker.io/arempter/makisu-example:0.1 to index.docker.io”}
{“level”:”info”,”ts”:1600695219.7676308,”msg”:”Finished building arempter/makisu-example:0.1"}

subsequent run is able to find cache entry stored by previous build jobs

{"level":"info","ts":1600780261.9083533,"msg":"Found mapping in cacheID kv store: 5fa8683f => c9ed45cdcc03aba995b5e84b5f4466b8d7d46d3665edf45843d348ff238f8481,1756cf2dce7bd9b64b87a7ab85bbd92f09515dd3cb6372514af4fcc5fc9fa72f"}Applying cache layer 1756cf2dce7bd9b64b87a7ab85bbd92f09515dd3cb6372514af4fcc5fc9fa72f (unpack=true)"}{"level":"info","ts":1600780312.338002,"msg":"* Skipping execution; cache was applied *"}

and the whole build process should be faster as some (or most) steps will be skipped and used from the cache.

Build Times

The first run where nothing was in Redis cache and no previous image was pushed to Docker Hub, took around 9 minutes

NAME                    COMPLETIONS   DURATION   AGE
scala-build-example-1   1/1           9m26s      10m

The second run fully from cache (as no changes to the project were made) took 1/3 of the initial build, less than 4 minutes

NAME                    COMPLETIONS   DURATION   AGE
scala-build-example-2   1/1           3m41s      3m45s

Subsequent runs showed similar build times. This is of course not the real performance test and it was conducted on toy k8s cluster (running on KVM virtual machines) and one node Redis. Even with the virtual environment it is quite obvious that caching builds allowed to speed up subsequent builds. It’s almost sure as well that build time between the jobs will differ based on the exact Dockerfile definition.

Failed build

During those tests, I wanted to check what will be the build time difference, if I just update the Scala source code.
I have updated the code, run the build, and found the following errors

"level":"info","ts":1600772119.2609763,"msg":"* Moving directories [/makisu-example/target/universal/makisu_example-0.1.1.zip] to /makisu-storage/sandbox/sandbox030734441/stages/c3RhZ2UtYXBw"}
{"level":"error","ts":1600772128.7237525,"msg":"failed to execute build plan: execute stage: checkpoint stage stage-app: stat /makisu-example/target/universal/makisu_example-0.1.1.zip: stat /makisu-example/target/universal/makisu_example-0.1.1.zip: no such file or directory"}

Ok, since I just used the master branch, I did not change the Dockerfile part that would recompile my code. So as the previous layer was extracted from the cache — the new application version was not there.

I guess, in production where the application release cycles are followed it would not happen. For this example, it is enough to add a checkout tag or branch in Dockerfile

RUN git clone https://github.com/arempter/makisu-example.git -b app_ver_x

So the subsequent builds would trigger application source code compilation.

Registry issues

Testing with the Docker Hub registry went smoothly. But once we started testing against Artifactory we noticed that cache is not used by the build jobs. We also found some errors

Failed to fetch intermediate layer with cache ID XXXX 
layer digest did not match

Then we found the following in Makisu documentation

If you encounter these errors when pushing your image to a registry, try to use the push_chunk: -1 option (some registries, despite implementing registry v2 do not support chunked upload, ECR and GCR being one example).

So even though we did not see any errors like BLOB_UPLOAD_INVALID and BLOB_UPLOAD_UNKNOWN, setting up push_chunk to disabled, solved the above errors.

Summary

Setting up and using Makisu during those tests was a smooth experience. I encountered some small issues while configuring the target docker repository and authentication. But this was nothing special.

I guess the natural next step would be to run this tool in a more production-like environment with proper Dockerfiles (especially those large ones) and a multi-node Redis cluster. Also CD/CI pipelines should be used to submit build jobs to K8s.

In general, I think, using Makisu can help:

in cases were you run pipelines frequently building docker images
simplify your pipeline by eliminating the docker daemon to build an image. Instead, just use docker with kubectl client to submit job definition
speed up build times by retrieving unchanged layers from cache

That’s all, thanks for reading…

Full code for this article can be found in the following repository in GitHub.