How to access private Git repositories during a Docker image build

A complete guide to building images that require access to SSH keys during the build process.

Niels Cautaerts
datamindedbe
6 min readDec 6, 2021

--

A Docker image is a great solution for packaging an application and all of its dependencies. Building a Docker image that only requires public resources is pretty simple as long as the host machine building the image has internet access: just add RUN commands to the Dockerfile with apt-get , curl , pip install , etc. to install dependencies. But what if you require a dependency or data that is not publicly accessible? In the case of a private git repository, you need an SSH private key to authenticate yourself and gain access. But usingCOPY to copy your private key into the image is very bad practice!

Image by qimono on Pixabay

The best approach at the moment is using the --ssh flag implemented in buildkit. The official documentation on the feature can be found here. It’s a bit bare bones, so I hope to flesh this out a bit in this article.

The basic steps to follow

1. Install an ssh client in the Docker image

To get started you need to at least have an SSH client installed in your docker image. On Debian/Ubuntu based images add

RUN apt-get update && \
apt-get install --yes --no-install-recommends \
openssh-client \
git \
&& apt-get clean && \

On alpine based images add

# Install ssh client and git
RUN apk add --no-cache openssh-client git

2. Get the necessary public keys

Suppose you need to access repositories on github.com and bitbucket.org. In the Dockerfile add

RUN mkdir -p -m 0600 ~/.ssh && \
ssh-keyscan -H github.com bitbucket.org >> ~/.ssh/known_hosts

The -H flag is optional; it hashes the hostname and address in the output so that someone snooping around in the container can’t read them in plaintext. The known_hosts file serves to indicate to the SSH client in the container that these hosts can be trusted and connected to.

3. Instruct Docker which commands require SSH

Now simply modify the all RUN commands in your Dockerfile that require an SSH connection to

RUN --mount=type=ssh ...

The addition will give the builder access to the required SSH credentials on the host only for and during these building steps. For example, to download a private git repository you might add something like

RUN --mount=type=ssh \
git clone git@<host>:<organization>/<private_repo>.git

4. Build with Buildkit and flags

Finally you instruct the builder to build with buildkit enabled. Either set the environment variable

$ export DOCKER_BUILDKIT=1

or set it as a regular variable just before the docker build command. If you are working with an SSH agent and have loaded the correct keys, run

$ docker build --ssh default .

Those are the basic steps, now the relevant RUN commands should have SSH access to the same private resources as the host on which you build the docker image.

Troubleshooting: understanding the SSH agent

If you still do not have access to the private resources and the build errors out with an access denied, it’s highly likely that there is a misconfiguration of your SSH agent.

An SSH agent is a small program running in the background that stores your credentials and passwords for keys (if any) in memory. If you run MacOS on your host, an SSH agent is already running in the background. You can interact with it using the ssh-add command. On Linux, an SSH agent is probably not active when you boot and start a shell. You can start it for that session with

$ eval ssh-agent $SHELL

To create a systemd compatible daemon out of the agent that starts up automatically on login follow the instructions in this stackoverflow answer.

If you followed the instructions in steps 1-4. above, then BuildKit will mount your SSH agent’s socket into the container, and define an environment variable SSH_AUTH_SOCKpointing to it. The SSH client in the container will pick up on this environment variable and through it communicate with the agent. A socket is a “file” on Unix based systems that allows for direct communication between processes. If an SSH agent is not running on the host the the following error will be given

could not parse ssh: [default]: stat <path>/ssh-agent.socket: no such file or directory

To get a successful build, the SSH agent not only needs to be running on the host but the right keys should also be loaded. To check which SSH keys are loaded by the agent run

$ ssh-add -l

If you do not see the keys you need to use during the building process, run

$ ssh-add <path/to/private-key>

You can check inside the container whether the right keys are loaded and whether the SSH_AUTH_SOCK is accessible using

RUN echo $(ssh-add -l) && echo $SSH_AUTH_SOCK

Note that the path of the socket in the container will be different than on the host, this is just a mount point in the container.

When it still doesn’t work

This is an issue you may run into if you have multiple accounts with the same service, for example a personal and a corporate Github/Bitbucket account. If you have multiple keys loaded in the SSH agent, and one of the keys allow you to connect to one account while another connects to your corporate account, you can get conflicts that will block you from certain resources. Why?

When git tries to connect via SSH, all the keys loaded by the SSH agent will be attempted in the same order as the ssh-add -l command. If a key is not recognized by the service, connection is refused and the next key is tried. However, if a key is a valid key for one of your accounts but not the account that is authorized to access certain resources, the connection will be established but access to the resource refused. Because the initial handshake was successful, instead of trying subsequent keys this simply errors out of the build. So in this specific scenario, the order in which the keys are loaded in the SSH agent are important.

To resolve this issue, you can remove conflicting keys with ssh-add -d path/to/key. To troubleshoot more git+SSH issues during the build, you can set

ENV GIT_SSH_COMMAND="ssh -vvv"

Bypassing the host’s SSH agent

An SSH agent running on the host is actually not strictly necessary. In step 4. of the build, a specific identity can also be passed to the container

$ docker build --ssh default=$HOME/.ssh/<private key> .

You must use the $HOME variable, expansion of ~ does not work. This will spin up an SSH agent inside the container with the relevant key loaded.

Using multiple identities

Suppose one RUN step needs to use one key, while another RUN step needs to use another key. You can get very fine grained control by specifying the id parameter each command should use.

You can specify different id’s in the build command using

$ docker build --ssh id1=path/to/key1 --ssh id2=path/to/key2 .

The names id1 and id2 are arbitrary. Each id needs to be paired with its own --ssh flag. If a RUN command should use id1 this should be indicated in the Dockerfile as

RUN --mount=type=ssh,id=id1 ...

During this build step only the specified id will be loaded into the container’s SSH agent.

A downside of bypassing the host SSH agent

Sometimes SSH keys are protected with a password. These passwords are cached by the SSH agent of the host. As far as I know, it is not possible to pass password protected identities to the docker build command.

Summary

In this post I went over how to securely use SSH credentials stored on the host to access private resources during a docker build process. I also discussed the SSH agent and various SSH connection issues that could come up, as well as how to troubleshoot them.

I work at Data Minded, an independent data engineering and data analytics consultancy based in Leuven. Contact us if you’d be interested in working together!

Data Minded logo

--

--