Nifty Docker tricks for your CI (vol. 1)

Paweł Lipski
Feb 18 · 5 min read

If you run dockerized jobs in your CI (or consider migration to the Docker-based flow), it’s very likely that some (if not most) of the techniques outlined in this blog post will prove useful to you.

We’ll take a closer look at the CI process for an open source tool, git machete, that is actively developed at VirtusLab. Having started as a simple tool for rebase automation, it has now grown into a full-fledged Git repository organizer. It even acquired its own logo, stylized as the original Git logo with extra forks, slashed in half.

The purpose of the git-machete’s CI is to ensure that its basic functions work correctly under a wide array of Git and Python versions that the users might have on their machines. In this blog post, we’re going to create a dockerized environment that allows to run such functional tests both locally and on a CI server. This particular project uses Travis CI, but the entire configuration can be migrated to any other modern CI with minimal effort.

This article assumes that you’re familiar with concepts like Dockerfiles and docker-compose.

High-level overview of the setup

Let’s start with the project layout (also available on Github):

These are the files that are particularly relevant to us:

Reducing image size: keep each layer small

The central part of the entire setup is the Dockerfile. Let’s first have a look at the part responsible for Git installation:

We’ll discuss the parts that have been skipped in the second part of this post when dealing with non-root user setup.

The purpose of these commands is to install a specific version of Git. The non-obvious step here is the very long chain of &&-ed shell commands under RUN, some of which, surprisingly, relate to removing rather than installing software (apk del, rm). This prompts two questions: why combine so many commands into a single RUN rather than split them into multiple RUNs; and why even remove any software at all?

Docker stores the image contents in layers that correspond to Dockerfile instructions. If an instruction (such as RUN or COPY) adds data to the underlying file system (which, by the way, is usually OverlayFS nowadays), these data, even if removed in a subsequent layer, will remain part of the intermediate layer that corresponds to the instruction, and will thus make their way to the final image.

If a piece of software (like alpine-sdk) is only needed for building the image but not for running the container, then leaving it installed is an utter waste of space. A reasonable way to prevent the resulting image from bloating is to remove unnecessary files in the very same layer in which they were added. Hence, the first RUN instruction installs all the compile-time dependencies of Git (alpine-sdk autoconf gettext wget zlib-dev), only to remove them (apk del) later in the same shell script. What remains in the resulting layer is just the Git installation that we care for, but not the toolchain it was built with (which would be useless in the final image).

A more naïve version of this Dockerfile, in which all the dependencies are installed at the very beginning and never removed, yields an almost 800 MB behemoth:

After including the apk del and rm commands, and squeezing the installations and removals into the same layer, the resulting image shrinks to around 150-250 MB, depending on the exact versions of Git and Python. This makes the images caches far less space-consuming.

As a side note, if you’re curious how I figured out which files (git-fast-import, git-http-backend etc.) can be removed from /usr/local/libexec/git-core/, take a look at dive, an excellent tool for inspecting files that reside within each layer of a Docker image.

Making the image reusable: mount a volume instead of COPY

It would be very handy if the same image could be used to test multiple versions of the code without having to rebuild the image. In order to achieve that, the Dockerfile doesn’t bake the entire project directory into the image with a COPY command (only the entrypoint script is directly copied). Instead, the codebase is mounted as a volume within the container. Let's take a closer look at ci/tox/docker-compose.yml, which provides the recipe on how to configure the image build and how to run the container.

We’ll return to the image: section and explain the origin of DIRECTORY_HASH later.

As the volumes: section shows, the entire codebase of git-machete is mounted under /home/ci-user/git-machete/ inside the container. The variables PYTHON_VERSION and GIT_VERSION, which correspond to python_version and git_version build args, are provided by Travis based on the configuration in .travis.yml, here redacted for brevity:

(Yes, we still keep Python 2 support… but nevertheless, if you still use Python 2, please upgrade your software!)

The part of the pipeline that actually uses the contents of the mounted volume is defined in the ci/tox/build-context/entrypoint.sh script that is COPY-ed into the image:

This script first checks if the git-machete repo has really been mounted under the current working directory, then fires the all-encompassing tox command that runs code style check, tests etc.

In the second part of the series, we will cover a technique for caching the images with great efficiency. We will also ensure that the files created by the running container inside the volume are not owned by root on the host machine.

VirtusLab

Virtus Lab company blog

Paweł Lipski

Written by

VirtusLab

VirtusLab

Virtus Lab company blog

More From Medium

More from VirtusLab

More on Tech from VirtusLab

More on Tech from VirtusLab

Nifty Docker tricks for your CI (vol. 2)

3

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade