How to access private Git repositories during a Docker image build
A complete guide to building images that require access to SSH keys during the build process.
A Docker image is a great solution for packaging an application and all of its dependencies. Building a Docker image that only requires public resources is pretty simple as long as the host machine building the image has internet access: just add RUN
commands to the Dockerfile with apt-get
, curl
, pip install
, etc. to install dependencies. But what if you require a dependency or data that is not publicly accessible? In the case of a private git repository, you need an SSH private key to authenticate yourself and gain access. But usingCOPY
to copy your private key into the image is very bad practice!
The best approach at the moment is using the --ssh
flag implemented in buildkit. The official documentation on the feature can be found here. It’s a bit bare bones, so I hope to flesh this out a bit in this article.
The basic steps to follow
1. Install an ssh client in the Docker image
To get started you need to at least have an SSH client installed in your docker image. On Debian/Ubuntu based images add
RUN apt-get update && \
apt-get install --yes --no-install-recommends \
openssh-client \
git \
&& apt-get clean && \
On alpine based images add
# Install ssh client and git
RUN apk add --no-cache openssh-client git
2. Get the necessary public keys
Suppose you need to access repositories on github.com and bitbucket.org. In the Dockerfile add
RUN mkdir -p -m 0600 ~/.ssh && \
ssh-keyscan -H github.com bitbucket.org >> ~/.ssh/known_hosts
The -H
flag is optional; it hashes the hostname and address in the output so that someone snooping around in the container can’t read them in plaintext. The known_hosts file serves to indicate to the SSH client in the container that these hosts can be trusted and connected to.
3. Instruct Docker which commands require SSH
Now simply modify the all RUN
commands in your Dockerfile that require an SSH connection to
RUN --mount=type=ssh ...
The addition will give the builder access to the required SSH credentials on the host only for and during these building steps. For example, to download a private git repository you might add something like
RUN --mount=type=ssh \
git clone git@<host>:<organization>/<private_repo>.git
4. Build with Buildkit and flags
Finally you instruct the builder to build with buildkit enabled. Either set the environment variable
$ export DOCKER_BUILDKIT=1
or set it as a regular variable just before the docker build
command. If you are working with an SSH agent and have loaded the correct keys, run
$ docker build --ssh default .
Those are the basic steps, now the relevant RUN
commands should have SSH access to the same private resources as the host on which you build the docker image.
Troubleshooting: understanding the SSH agent
If you still do not have access to the private resources and the build errors out with an access denied, it’s highly likely that there is a misconfiguration of your SSH agent.
An SSH agent is a small program running in the background that stores your credentials and passwords for keys (if any) in memory. If you run MacOS on your host, an SSH agent is already running in the background. You can interact with it using the ssh-add
command. On Linux, an SSH agent is probably not active when you boot and start a shell. You can start it for that session with
$ eval ssh-agent $SHELL
To create a systemd compatible daemon out of the agent that starts up automatically on login follow the instructions in this stackoverflow answer.
If you followed the instructions in steps 1-4. above, then BuildKit will mount your SSH agent’s socket into the container, and define an environment variable SSH_AUTH_SOCK
pointing to it. The SSH client in the container will pick up on this environment variable and through it communicate with the agent. A socket is a “file” on Unix based systems that allows for direct communication between processes. If an SSH agent is not running on the host the the following error will be given
could not parse ssh: [default]: stat <path>/ssh-agent.socket: no such file or directory
To get a successful build, the SSH agent not only needs to be running on the host but the right keys should also be loaded. To check which SSH keys are loaded by the agent run
$ ssh-add -l
If you do not see the keys you need to use during the building process, run
$ ssh-add <path/to/private-key>
You can check inside the container whether the right keys are loaded and whether the SSH_AUTH_SOCK
is accessible using
RUN echo $(ssh-add -l) && echo $SSH_AUTH_SOCK
Note that the path of the socket in the container will be different than on the host, this is just a mount point in the container.
When it still doesn’t work
This is an issue you may run into if you have multiple accounts with the same service, for example a personal and a corporate Github/Bitbucket account. If you have multiple keys loaded in the SSH agent, and one of the keys allow you to connect to one account while another connects to your corporate account, you can get conflicts that will block you from certain resources. Why?
When git tries to connect via SSH, all the keys loaded by the SSH agent will be attempted in the same order as the ssh-add -l
command. If a key is not recognized by the service, connection is refused and the next key is tried. However, if a key is a valid key for one of your accounts but not the account that is authorized to access certain resources, the connection will be established but access to the resource refused. Because the initial handshake was successful, instead of trying subsequent keys this simply errors out of the build. So in this specific scenario, the order in which the keys are loaded in the SSH agent are important.
To resolve this issue, you can remove conflicting keys with ssh-add -d path/to/key
. To troubleshoot more git+SSH issues during the build, you can set
ENV GIT_SSH_COMMAND="ssh -vvv"
Bypassing the host’s SSH agent
An SSH agent running on the host is actually not strictly necessary. In step 4. of the build, a specific identity can also be passed to the container
$ docker build --ssh default=$HOME/.ssh/<private key> .
You must use the $HOME
variable, expansion of ~
does not work. This will spin up an SSH agent inside the container with the relevant key loaded.
Using multiple identities
Suppose one RUN
step needs to use one key, while another RUN
step needs to use another key. You can get very fine grained control by specifying the id
parameter each command should use.
You can specify different id’s in the build
command using
$ docker build --ssh id1=path/to/key1 --ssh id2=path/to/key2 .
The names id1
and id2
are arbitrary. Each id needs to be paired with its own --ssh
flag. If a RUN
command should use id1
this should be indicated in the Dockerfile as
RUN --mount=type=ssh,id=id1 ...
During this build step only the specified id will be loaded into the container’s SSH agent.
A downside of bypassing the host SSH agent
Sometimes SSH keys are protected with a password. These passwords are cached by the SSH agent of the host. As far as I know, it is not possible to pass password protected identities to the docker build
command.
Summary
In this post I went over how to securely use SSH credentials stored on the host to access private resources during a docker build process. I also discussed the SSH agent and various SSH connection issues that could come up, as well as how to troubleshoot them.
I work at Data Minded, an independent data engineering and data analytics consultancy based in Leuven. Contact us if you’d be interested in working together!