Docker-in-Docker in Gitlab Runners

8 min readJul 8, 2017

Recently, I’ve been setting up a Gitlab instance for a client. Using Gitlab has been a mostly painless — indeed pleasant — process, but finding a reasonable balance between build-cleanliness and performance for CI instances was particularly finicky.

After reading many forum posts and feature requests in the Gitlab issue tracker (particularly this comment), I adapted a solution that works for me. Be aware, however, that this solution is only reasonable in trusted environments building trusted code. Anything running Docker-in-Docker is potentially subject to privilege escalation. Be careful.

A desirable configuration

Before we jump into the details, it’s worth looking at what I wanted to achieve in this setup:

The ability to build and test projects that build Docker images
For those builds to be fast, where possible
For those builds to occur in a clean environment every time
For those builds to be isolated from their host system

First, the ability to build and test Docker-based projects implies that docker needs to be accessible to the build process. There are multiple ways to achieve this, either through the build process having root privileges itself, or access to another privileged Docker instance either via a mounted /var/run/docker.sock or over a TCP socket.

Second, to make those builds fast, I wanted to lean on Docker layer caching across builds. This implies that whatever Docker instance is being used to build containers, it needs to be long-lived — longer than any individual build.

Third, to achieve a clean build environment, I decided that the build environment itself should be running inside of a Docker container that can be summarily discarded post-build.

Fourth, and finally, I wanted to ensure that these build containers were relatively isolated from the host system. For my purposes, this meant that build containers should not conflict with other containers running on the host. This means that whatever Docker instance they’re using should be reserved entirely for their use. (See caveats, below, for some limitations on my approach regarding cross-build contamination.)

Finally, I added one final kink to make this a bit more fun. I wanted the Gitlab runner itself to run inside of a Docker instance. So it was to be Docker-in-Docker all the way down.

Part 1: Setting up the server

First up, I needed to set up my server. I opted for Ubuntu 16.04. The first step was installing Docker 17.06. Since Ubuntu 16.04 comes with a recent 4.x Kernel, I can use the overlay2 volume driver for speed. Indeed, without this, everything that follows becomes unbearably slow. That meant adding the overlay module to the kernel, and configuring Docker daemon to use it.

You can tell if your kernel has the overlay module loaded by checking (as root):

# lsmod | grep overlay
overlay                49152  3

If it doesn’t appear, you will have to add it:

# modprobe overlay

And you’ll probably want to add it to your /etc/modules file so that it’s loaded on boot:

# grep '^overlay$' /etc/modules || echo overlay >>/etc/modules

Next, you’ll need to configure Docker to actually utilize the overlay module. Recent versions of Docker come with the overlay2 volume driver, which is faster than the original overlay driver. Ensure that your /etc/docker/daemon.json file — you may have to create it — looks something like this:

{
  "storage-driver": "overlay2"
}

Note: Switching storage drivers will lose any stored Docker volume data. With that in mind, restart Docker to load the new configuration:

# systemctl restart docker

Great! Now we have a foundation to build the rest of our CI environment.

Part 2: Installing Docker-in-Docker

I wanted all builds to be running isolated from the host Docker environment. This doesn’t mean secure, simply that the namespace used by these builds is isolated to these builds themselves. This means I want a so-called “Docker-in-Docker” environment (with all of its well-documented problems and benefits). This is the long-lived Docker instance that the builds will use instead of the host’s Docker instance, allowing for layer-caching.

To get started, let’s create a network for it to exist in:

# docker network create gitlab-runner-net

Now let’s create our Docker-in-Docker instance within it:

# docker run -d \
  --name gitlab-dind \
  --privileged \
  --restart always \
  --network gitlab-runner-net \
  -v /var/lib/docker \
  docker:17.06.0-ce-dind \
    --storage-driver=overlay2

This creates a Docker container named gitlab-dind running in privileged mode (so that it can create its own containers), auto-restarting on failure, with its /var/lib/docker folder in an anonymous volume, running an instance of Docker’s official Docker-in-Docker dind image that is the same version of Docker as the host (17.06). Finally, any storage volumes that containers running within gitlab-dind will also be using the overlay2 file system (the last --storage-driver option is an argument to the entrypoint of the dind image, not to the docker command running on the host).

It’s important to note that gitlab-dind has its own Docker socket internally at /var/run/docker.sock, as well as exposes that socket over tcp at port 2375. We will use this momentarily.

Part 3: Installing the Gitlab-runner

We could run the Gitlab-runner directly on the host machine. But since we’re so Docker-centric, why not run it in its own container? First we’ll create an easy place to access its configuration:

# mkdir -p /srv/gitlab-runner
# touch /srv/gitlab-runner/config.toml

Then we can spin up the runner itself:

# docker run -d \
  --name gitlab-runner \
  --restart always \
  --network gitlab-runner-net \
  -v /srv/gitlab-runner/config.toml:/etc/gitlab-runner/config.toml \
  -e DOCKER_HOST=tcp://gitlab-dind:2375 \
  gitlab/gitlab-runner:alpine

Of particular interest is the environment variable DOCKER_HOST that we pass through to gitlab-runner. It points at the exposed Docker TCP port on the gitlab-dind container we created previously. Whenever the runner spins up build containers, it will use gitlab-dind to do so. This means that the runner does not need to be run in privileged mode.

Note: Many other guides I found used --link gitlab-dind:docker. This implicitly injects a DOCKER_HOST environment variable into the gitlab-runner container, which is what makes their magic work. I chose not to use --link because it is now deprecated, so we need to be more explicit. Also, if you use the default Docker network, the runner will be unable to find docker-dind. Using the gitlab-runner-net will remedy this problem by providing automatic DNS resolution.

Part 4: Registering the runner

Go ahead and log in as the administrator of your Gitlab installation. We’re going to add a shared runner. Find and copy your runner token. We can register it like so:

# docker run -it --rm \
  -v /srv/gitlab-runner/config.toml:/etc/gitlab-runner/config.toml \
  gitlab/gitlab-runner:alpine \
    register \
    --executor docker \
    --docker-image docker:17.06.0-ce \
    --docker-volumes /var/run/docker.sock:/var/run/docker.sockRunning in system-mode.Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com/):
https://gitlab.yourhost.example
Please enter the gitlab-ci token for this runner:
YOUR_RUNNER_TOKEN
Please enter the gitlab-ci description for this runner:
[abc123def456]: Docker Runner
Please enter the gitlab-ci tags for this runner (comma separated):Whether to lock Runner to current project [true/false]:
[false]:
Registering runner... succeeded                     runner=xxx
Please enter the executor: virtualbox, docker+machine, docker-ssh+machine, ssh, docker-ssh, parallels, shell, kubernetes, docker:
[docker]: docker
Please enter the default Docker image (e.g. ruby:2.1):
[docker:17.06.0-ce]: docker:17.06.0-ce
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!

A few things to note here: One, we’re mounting the same config.toml file so that we can write to it. This seems to work fine, and the gitlab-runner container should automatically pick up the changes.

Secondly, we’ve specified a --docker-volume that mounts the host’s Docker socket into every build container that’s created. Since the “host” in this case is gitlab-dind, this gives every build-container access to gitlab-dind’s Docker environment.

Lastly, the default --docker-image we’re using is docker:17.06.0-ce, which is the same version as gitlab-dind and, ultimately, the host Ubuntu system.

If you take a look at your /srv/gitlab-runner/config.toml file, it should look roughly like:

# cat /srv/gitlab-runner/config.toml
concurrent = 1
check_interval = 0[[runners]]
  name = "Docker Runner"
  url = "https://gitlab.yourhost.example"
  token = "PER_RUNNER_TOKEN"
  executor = "docker"
  [runners.docker]
    host = "tcp://gitlab-dind:2375"
    tls_verify = false
    image = "docker:17.06.0-ce"
    privileged = false
    disable_cache = false
    volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
    shm_size = 0
  [runners.cache]

Part 4: Testing the setup

I won’t step you through this process, but it’s worth testing the setup and making sure that it all works. A .gitlab-ci.yml file, roughly like the following, in a new repo should do the trick:

before_scripts:
- docker info
- docker run --rm hello-world

If you run this twice, you should see that the first time it pulls down the hello-world image, and the second time it uses a cache. Great! Builds should do the same. Try it out.

Caveats and Future Work

Although this is a nice setup, it’s not perfect. Indeed, solutions based around docker-machine and auto-scaled runners, as documented here, might be a better overall. For now, this works for me.

One of the biggest potential gotchas is namespace clashing. Although build containers are using a distinct Docker namespace from the host Ubuntu machine, they still exist in the same namespace with each-other. The Gitlab runner should make sure the build containers’ names don’t conflict, but any docker builds inside of the build containers could easily conflict, and must be manually managed. This is currently an unavoidable trade-off for gaining build layer caching

It takes a bit of work, but with consistent .gitlab-ci.yml hygiene you should be able to avoid problems. Any use of docker inside of the build containers should either create auto-named containers, or manually utilize ${CI_BUILD_REF} to disambiguate them. This won’t work for docker-compose use, however, since all potentially simultaneous builds of a project will have the same COMPOSE_PROJECT_NAME. You can either set this manually, or automatically for all Docker runners by editing your /srv/gitlab-runner/config.toml file:

...
[[runners]]
  ...
  environment = ["COMPOSE_PROJECT_NAME=${CI_BUILD_REF}"]
  ...

Even with all of this work, there is a further problem: Dangling builds. The Gitlab runner itself comes with a script at /usr/share/gitlab-runner/clear-docker-cache that can clean up dangling runner-builds. It looks like this:

#!/bin/shset -e
docker version >/dev/null 2>/dev/nullecho Clearing docker cache...CONTAINERS=$(docker ps -a -q \
             --filter=status=exited \
             --filter=status=dead \
             --filter=label=com.gitlab.gitlab-runner.type=cache)if [ -n "${CONTAINERS}" ]; then
    docker rm -v ${CONTAINERS}
fi

Two problems: This script is not available in the gitlab-runner image (it wouldn’t run there anyway, since there is no docker command). Second, I don’t believe it would capture containers that the Gitlab runner did not create directly.

We’re left with a few options:

Have impeccable container hygiene everywhere, always remembering to destroy everything we create. (Hah.)
Recreate the gitlab-dind container from scratch occasionally, flushing the entire Docker cache.
Use the newish docker system prune command with --filter 'until=7d', or similar. It seems that it will clear away images pulled from the Docker Hub more than a week ago, but their layers will persist if another container references them. This should make re-pulls relatively painless. Edit: Despite what the documents say, it appears this filter isn’t yet functional for system prune, but can be pieced together via manually pruning containers, images, networks, and volumes.

I have a few other ideas, but not the time to implement them. In the future, it might be possible to wrap the docker command so that every container that’s built is automatically given a build-label. That, coupled with something like a post_build_script, or even an occasional cron-cleanup script, could mitigate a lot of dangling images.

There are other options that work on other levels: Using a local proxying registry so that image-pulls don’t hit the main Docker Hub registry most of the time, and potentially proxy-caching outgoing requests to APT, PyPi, NPM/Yarn, etc. This wouldn’t mitigate build times, but it could mitigate pull times.

Conclusion

Getting a working CI environment up can be quite the task. Hopefully this helps you and saves you uncountable hours and frustration. Thanks for reading!