Getting your Docker containers to talk to each other in your Google Cloud Build CI/CD pipeline

Anne-Marie Pritchard
Kudos Engineering
Published in
7 min readJun 26, 2023
Golden metal pipes connect softly glowing lightbulbs in an complex, angular network.
Photo by Dane Deaner on Unsplash

At Kudos we use Google Cloud Build (GCB) to create build pipelines for our apps and microservices. GCB pipelines are made up of Docker containers (referred to as ‘build steps’) that can be used to build, test, and deploy applications as part of a continuous integration/continuous delivery (CI/CD) process. This post demonstrates how to create a GCB pipeline that has multiple and concurrently running build steps that can talk to each other on a shared Docker network. We’ll also learn why an awareness of Docker execution models (‘Docker outside of Docker’ vs ‘Docker inside of Docker’) is important to being able to reason about how GCB pipelines behave.

This example shows a simple Go server being run as a build step, and a later build step making HTTP requests to it to verify that it’s receiving traffic. The Go server (inspired by Alex Edwards’s post) has one registered route, ‘/health’, which returns ‘OK’ and the current time. It has a Dockerfile so that it can be run as a container in the build pipeline. (By the way, Docker has a great guide on building and running Go containers).

Real world application

Although the pipeline used here is simplified for the purposes of demonstration, in the real world a similar setup can be used to allow end-to-end testing. For example, you might have an application running as a build step that makes external HTTP service calls as part of its end-to-end test suite, and another build step running API stubs (such as Mountebank stubs, for example) which would need to receive these requests and return fixed responses for testing.

Constructing the Cloud Build pipeline

In GCB pipelines, you can run a Docker image as a container directly as a build step, or you can use a Cloud Builder image. GCB’s documentation defines Cloud Builder images as ‘Cloud Build/developer community provided container images with common languages and tools installed in them’.

Pre-built images (ones that are publicly available, such as Curl) are usually ideal for running directly as a build step. We’ll be building the Go server image within the pipeline however, so we’ll need the help of a Cloud builder image. The best one for this purpose is the Docker Cloud builder image, which can build and run other Docker images using the Docker CLI.

If the first build step below succeeds, the built Go server image will be run directly as the next build step using the image tag name we gave it:

steps:
- name: 'gcr.io/cloud-builders/docker'
id: 'build-server-image'
args:
- build
- --tag=simple-go-server
- '.'

- name: 'simple-go-server'
id: 'run-server-image'
waitFor:
- 'build-server-image'

When we run the Cloud Build pipeline locally, this is the output:

Finished Step #0 - "build-server-image"
2023/06/08 14:41:12 Step Step #0 - "build-server-image" finished
Starting Step #1 - "run-server-image"
Step #1 - "run-server-image": Already have image: simple-go-server
Step #1 - "run-server-image": 2023/06/08 13:41:13 Listening...

It seems as though the server is up and running, which is good, but now we want to make sure it’s receiving traffic. In a later build step, a Curl container makes a request to the server’s health endpoint until it gets a success response or exceeds the number of allowed retries. The final build step in the pipeline stops the Go server container so that it isn’t left running when the pipeline finishes.

- id: server-ready
name: 'curlimages/curl:latest'
waitFor: ['build-server-image']
args: [
'--retry', '10',
'--retry-delay', '5',
'--retry-all-errors',
'localhost:8080/health'
]

- id: stop-server
name: 'gcr.io/cloud-builders/docker'
waitFor: ['server-ready']
args: ['stop', 'server']

This doesn’t work, but this isn’t unexpected. In the output below we can see that the Curl container was unable to resolve the Go server’s hostname:

Step #2 - "server-ready": Warning: Problem : timeout. Will retry in 5 seconds. 1 retries left.
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
Step #2 - "server-ready": curl: (6) Could not resolve host: server
Finished Step #2 - "server-ready"
2023/06/08 14:50:56 Step Step #2 - "server-ready" finished
Finished Step #1 - "run-server-image"
2023/06/08 14:50:56 Step Step #1 - "run-server-image" finished
2023/06/08 14:50:56 status changed to "ERROR"
ERROR
ERROR: build step 2 "curlimages/curl:latest" failed: exit status 6
2023/06/08 14:50:57 Build finished with ERROR status

When GCB runs a build step directly it uses a fixed network configuration for the container. To make sure our server is accessible on the network, we’ll need to:

  • Give it a name so that other containers can resolve it via DNS
  • Publish the server’s ports so that they’re available outside the container

Both of these things are possible if we use a Docker Cloud Builder, instead of running the container directly as a build step, because this is the only way we can provide certain Docker run flags to the Docker CLI. This is the updated ‘run-server-image’ step:

- name: 'gcr.io/cloud-builders/docker'
id: 'run-server-image'
waitFor:
- 'build-server-image'
args: [
'run',
'--rm',
# provide 'name' and 'publish' flags
'--name', 'server',
'--publish', '8080:8080',
'simple-go-server'
]

Despite the updated Go server configuration, when the pipeline is run again we hit the same problem. What’s going on?

Reviewing our mental model of how GCB pipelines work

When we imagine how Docker Cloud builders work within a Cloud Build pipeline, we intuitively think of a Docker container (in this case the Go server) running inside of another Docker container (the Docker Cloud Builder), something like:

A diagram showing the Cloud Builder container running on the host machine and the Go server container running inside the Cloud Builder container.

The problem makes a lot more sense when we realise that GCB works using a model called ‘Docker outside of Docker (DooD)’ rather than ‘Docker in Docker (DinD)’. Let’s revise our mental model to look like the following:

A diagram showing the host machine’s docker daemon running the Cloud Builder container and the Go server container as sibling containers.

In the DooD model, the Docker Cloud builder container isn’t running the Go server container within itself using its own docker daemon. Instead it makes a call to the host’s docker daemon and tells it to run the Go server container. As it’s the host’s docker daemon that is responsible for spinning up the Go server container, the Go server is not actually running as a build step but is running as a result of a build step.

When GCB runs a container as a build step, the fixed network configuration adds it to a network called ‘cloudbuild’. When we run a container using the Cloud Builder container (and the Docker CLI), it gets added to the default ‘bridge’ network. This explains why the other containers can’t talk to the Go server over the cloudbuild network; it’s attached to a completely separate network:

A diagram showing the Docker Cloud builder container and the Curl container running on the cloudbuild network and the Go server container running on the default network. The Curl container is unsuccessfully trying to talk to the Go server.

Fortunately, the workaround for this is straightforward. We can use the Docker run network flag to tell the host’s docker daemon to attach the Go server container to the cloudbuild network instead of the bridge network:

- name: 'gcr.io/cloud-builders/docker'
id: 'run-server-image'
waitFor:
- 'build-server-image'
args: [
'run',
'--rm',
'--name', 'server',
'--publish', '8080:8080',
'--network', 'cloudbuild',
'simple-go-server'
]

When the build pipeline is run again, we can see the Curl build step (called ‘server-ready’) is able to resolve the Go server’s hostname and receives a success response when calling the ‘health’ endpoint.

Step #2 - "server-ready": OK: Thu, 08 Jun 2023 15:31:44 UTC
Finished Step #2 - "server-ready"
2023/06/08 16:31:44 Step Step #2 - "server-ready" finished
Starting Step #3 - "stop-server"
Step #3 - "stop-server": Already have image (with digest): gcr.io/cloud-builders/docker
Step #3 - "stop-server": server
Finished Step #3 - "stop-server"

This seems obvious once you know it, but it can trip you up if you don’t know that GCB uses the DooD model. The final diagram illustrates the Curl build step being able to communicate with the Go server container, now that it’s running on the cloudbuild network.

A diagram showing the Docker Cloud builder container, the Curl container, and the Go server container all running on the clouduild network. The Curl container is making a request to the Go server and getting a success response.

Conclusion

  • Exposing ports and other custom network configurations require running containers manually using the Docker CLI
  • When we do that, we don’t benefit from the fixed network configuration used by GCB for containers running as build steps
  • We can fix the issue by manually adding our container to the correct network
  • Having the right mental model about where the container is running and why helps us to understand and fix issues with sophisticated build pipelines

It’s also worth noting that the technical issues involved when using DinD within CI/CD pipelines has been discussed and documented over the years (see Jérôme Petazzoni’s article). What we’ve been referring to in this post as ‘Docker outside of Docker’ is referred to as ‘Docker socket binding’ or ‘the socket solution’ elsewhere.

As Jérôme Petazzoni’s article explains, this is because it’s possible to achieve DooD manually by exposing the host’s Docker socket to a container running Docker. As that container has access to the host’s Docker socket it will be able to start up ‘sibling’ containers running outside of itself, and indeed, this is what GCB is doing for us behind the scenes when we use Docker Cloud builders.

Of course not everyone uses GCB, and CI/CD pipeline tools differ in the ways they use Docker execution models. Some give you the flexibility to choose between DinD and DooD, whereas others run all their build steps in a single virtual machine, rather than as Docker containers. As we’ve seen, being aware of your CI/CD tool’s execution model can be a great help when debugging build pipeline issues.

--

--