Docker-in-Docker in Gitlab Runners
Recently, I’ve been setting up a Gitlab instance for a client. Using Gitlab has been a mostly painless — indeed pleasant — process, but finding a reasonable balance between build-cleanliness and performance for CI instances was particularly finicky.
After reading many forum posts and feature requests in the Gitlab issue tracker (particularly this comment), I adapted a solution that works for me. Be aware, however, that this solution is only reasonable in trusted environments building trusted code. Anything running Docker-in-Docker is potentially subject to privilege escalation. Be careful.
A desirable configuration
Before we jump into the details, it’s worth looking at what I wanted to achieve in this setup:
- The ability to build and test projects that build Docker images
- For those builds to be fast, where possible
- For those builds to occur in a clean environment every time
- For those builds to be isolated from their host system
First, the ability to build and test Docker-based projects implies that docker
needs to be accessible to the build process. There are multiple ways to achieve this, either through the build process having root
privileges itself, or access to another privileged Docker instance either via a mounted /var/run/docker.sock
or over a TCP socket.
Second, to make those builds fast, I wanted to lean on Docker layer caching across builds. This implies that whatever Docker instance is being used to build containers, it needs to be long-lived — longer than any individual build.
Third, to achieve a clean build environment, I decided that the build environment itself should be running inside of a Docker container that can be summarily discarded post-build.
Fourth, and finally, I wanted to ensure that these build containers were relatively isolated from the host system. For my purposes, this meant that build containers should not conflict with other containers running on the host. This means that whatever Docker instance they’re using should be reserved entirely for their use. (See caveats, below, for some limitations on my approach regarding cross-build contamination.)
Finally, I added one final kink to make this a bit more fun. I wanted the Gitlab runner itself to run inside of a Docker instance. So it was to be Docker-in-Docker all the way down.
Part 1: Setting up the server
First up, I needed to set up my server. I opted for Ubuntu 16.04. The first step was installing Docker 17.06. Since Ubuntu 16.04 comes with a recent 4.x Kernel, I can use the overlay2
volume driver for speed. Indeed, without this, everything that follows becomes unbearably slow. That meant adding the overlay module to the kernel, and configuring Docker daemon to use it.
You can tell if your kernel has the overlay
module loaded by checking (as root
):
# lsmod | grep overlay
overlay 49152 3
If it doesn’t appear, you will have to add it:
# modprobe overlay
And you’ll probably want to add it to your /etc/modules
file so that it’s loaded on boot:
# grep '^overlay$' /etc/modules || echo overlay >>/etc/modules
Next, you’ll need to configure Docker to actually utilize the overlay
module. Recent versions of Docker come with the overlay2
volume driver, which is faster than the original overlay
driver. Ensure that your /etc/docker/daemon.json
file — you may have to create it — looks something like this:
{
"storage-driver": "overlay2"
}
Note: Switching storage drivers will lose any stored Docker volume data. With that in mind, restart Docker to load the new configuration:
# systemctl restart docker
Great! Now we have a foundation to build the rest of our CI environment.
Part 2: Installing Docker-in-Docker
I wanted all builds to be running isolated from the host Docker environment. This doesn’t mean secure, simply that the namespace used by these builds is isolated to these builds themselves. This means I want a so-called “Docker-in-Docker” environment (with all of its well-documented problems and benefits). This is the long-lived Docker instance that the builds will use instead of the host’s Docker instance, allowing for layer-caching.
To get started, let’s create a network for it to exist in:
# docker network create gitlab-runner-net
Now let’s create our Docker-in-Docker instance within it:
# docker run -d \
--name gitlab-dind \
--privileged \
--restart always \
--network gitlab-runner-net \
-v /var/lib/docker \
docker:17.06.0-ce-dind \
--storage-driver=overlay2
This creates a Docker container named gitlab-dind
running in privileged mode (so that it can create its own containers), auto-restarting on failure, with its /var/lib/docker
folder in an anonymous volume, running an instance of Docker’s official Docker-in-Docker dind
image that is the same version of Docker as the host (17.06). Finally, any storage volumes that containers running within gitlab-dind
will also be using the overlay2
file system (the last --storage-driver
option is an argument to the entrypoint of the dind
image, not to the docker command running on the host).
It’s important to note that gitlab-dind
has its own Docker socket internally at /var/run/docker.sock
, as well as exposes that socket over tcp at port 2375
. We will use this momentarily.
Part 3: Installing the Gitlab-runner
We could run the Gitlab-runner directly on the host machine. But since we’re so Docker-centric, why not run it in its own container? First we’ll create an easy place to access its configuration:
# mkdir -p /srv/gitlab-runner
# touch /srv/gitlab-runner/config.toml
Then we can spin up the runner itself:
# docker run -d \
--name gitlab-runner \
--restart always \
--network gitlab-runner-net \
-v /srv/gitlab-runner/config.toml:/etc/gitlab-runner/config.toml \
-e DOCKER_HOST=tcp://gitlab-dind:2375 \
gitlab/gitlab-runner:alpine
Of particular interest is the environment variable DOCKER_HOST
that we pass through to gitlab-runner
. It points at the exposed Docker TCP port on the gitlab-dind
container we created previously. Whenever the runner spins up build containers, it will use gitlab-dind
to do so. This means that the runner does not need to be run in privileged mode.
Note: Many other guides I found used --link gitlab-dind:docker
. This implicitly injects a DOCKER_HOST
environment variable into the gitlab-runner
container, which is what makes their magic work. I chose not to use --link
because it is now deprecated, so we need to be more explicit. Also, if you use the default Docker network, the runner will be unable to find docker-dind
. Using the gitlab-runner-net
will remedy this problem by providing automatic DNS resolution.
Part 4: Registering the runner
Go ahead and log in as the administrator of your Gitlab installation. We’re going to add a shared runner. Find and copy your runner token. We can register it like so:
# docker run -it --rm \
-v /srv/gitlab-runner/config.toml:/etc/gitlab-runner/config.toml \
gitlab/gitlab-runner:alpine \
register \
--executor docker \
--docker-image docker:17.06.0-ce \
--docker-volumes /var/run/docker.sock:/var/run/docker.sockRunning in system-mode.Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com/):
https://gitlab.yourhost.example
Please enter the gitlab-ci token for this runner:
YOUR_RUNNER_TOKEN
Please enter the gitlab-ci description for this runner:
[abc123def456]: Docker Runner
Please enter the gitlab-ci tags for this runner (comma separated):Whether to lock Runner to current project [true/false]:
[false]:
Registering runner... succeeded runner=xxx
Please enter the executor: virtualbox, docker+machine, docker-ssh+machine, ssh, docker-ssh, parallels, shell, kubernetes, docker:
[docker]: docker
Please enter the default Docker image (e.g. ruby:2.1):
[docker:17.06.0-ce]: docker:17.06.0-ce
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!
A few things to note here: One, we’re mounting the same config.toml
file so that we can write to it. This seems to work fine, and the gitlab-runner
container should automatically pick up the changes.
Secondly, we’ve specified a --docker-volume
that mounts the host’s Docker socket into every build container that’s created. Since the “host” in this case is gitlab-dind
, this gives every build-container access to gitlab-dind
’s Docker environment.
Lastly, the default --docker-image
we’re using is docker:17.06.0-ce
, which is the same version as gitlab-dind
and, ultimately, the host Ubuntu system.
If you take a look at your /srv/gitlab-runner/config.toml
file, it should look roughly like:
# cat /srv/gitlab-runner/config.toml
concurrent = 1
check_interval = 0[[runners]]
name = "Docker Runner"
url = "https://gitlab.yourhost.example"
token = "PER_RUNNER_TOKEN"
executor = "docker"
[runners.docker]
host = "tcp://gitlab-dind:2375"
tls_verify = false
image = "docker:17.06.0-ce"
privileged = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
shm_size = 0
[runners.cache]
Part 4: Testing the setup
I won’t step you through this process, but it’s worth testing the setup and making sure that it all works. A .gitlab-ci.yml
file, roughly like the following, in a new repo should do the trick:
before_scripts:
- docker info
- docker run --rm hello-world
If you run this twice, you should see that the first time it pulls down the hello-world
image, and the second time it uses a cache. Great! Builds should do the same. Try it out.
Caveats and Future Work
Although this is a nice setup, it’s not perfect. Indeed, solutions based around docker-machine
and auto-scaled runners, as documented here, might be a better overall. For now, this works for me.
One of the biggest potential gotchas is namespace clashing. Although build containers are using a distinct Docker namespace from the host Ubuntu machine, they still exist in the same namespace with each-other. The Gitlab runner should make sure the build containers’ names don’t conflict, but any docker builds inside of the build containers could easily conflict, and must be manually managed. This is currently an unavoidable trade-off for gaining build layer caching
It takes a bit of work, but with consistent .gitlab-ci.yml
hygiene you should be able to avoid problems. Any use of docker
inside of the build containers should either create auto-named containers, or manually utilize ${CI_BUILD_REF}
to disambiguate them. This won’t work for docker-compose
use, however, since all potentially simultaneous builds of a project will have the same COMPOSE_PROJECT_NAME
. You can either set this manually, or automatically for all Docker runners by editing your /srv/gitlab-runner/config.toml
file:
...
[[runners]]
...
environment = ["COMPOSE_PROJECT_NAME=${CI_BUILD_REF}"]
...
Even with all of this work, there is a further problem: Dangling builds. The Gitlab runner itself comes with a script at /usr/share/gitlab-runner/clear-docker-cache
that can clean up dangling runner-builds. It looks like this:
#!/bin/shset -e
docker version >/dev/null 2>/dev/nullecho Clearing docker cache...CONTAINERS=$(docker ps -a -q \
--filter=status=exited \
--filter=status=dead \
--filter=label=com.gitlab.gitlab-runner.type=cache)if [ -n "${CONTAINERS}" ]; then
docker rm -v ${CONTAINERS}
fi
Two problems: This script is not available in the gitlab-runner
image (it wouldn’t run there anyway, since there is no docker
command). Second, I don’t believe it would capture containers that the Gitlab runner did not create directly.
We’re left with a few options:
- Have impeccable container hygiene everywhere, always remembering to destroy everything we create. (Hah.)
- Recreate the
gitlab-dind
container from scratch occasionally, flushing the entire Docker cache. - Use the newish
docker system prune
command with--filter 'until=7d'
, or similar. It seems that it will clear away images pulled from the Docker Hub more than a week ago, but their layers will persist if another container references them. This should make re-pulls relatively painless. Edit: Despite what the documents say, it appears this filter isn’t yet functional forsystem prune
, but can be pieced together via manually pruning containers, images, networks, and volumes.
I have a few other ideas, but not the time to implement them. In the future, it might be possible to wrap the docker
command so that every container that’s built is automatically given a build-label. That, coupled with something like a post_build_script
, or even an occasional cron-cleanup script, could mitigate a lot of dangling images.
There are other options that work on other levels: Using a local proxying registry so that image-pulls don’t hit the main Docker Hub registry most of the time, and potentially proxy-caching outgoing requests to APT, PyPi, NPM/Yarn, etc. This wouldn’t mitigate build times, but it could mitigate pull times.
Conclusion
Getting a working CI environment up can be quite the task. Hopefully this helps you and saves you uncountable hours and frustration. Thanks for reading!