Multi-Stage Docker Builds to the Rescue

With version 17.05, Docker introduced a new feature called multi-stage builds. This feature makes it easier to achieve slim and clean Docker images.

In this post, I’ll briefly describe this feature and show some use cases.

Introduction

Multi-stage Docker builds enable you to assemble a build pipeline within a single Dockerfile. In practice, it looks like you’re just merging the content of several Dockerfiles in a single one, but it’s more than that: The real power lies in the fact that each stage can access the artefacts of previous ones (think in stages like ephemeral layers).

Building slim Docker images can be a challenge. For each instruction in your Dockerfile a new layer is created, increasing the final image size. The usual approach to build an efficient Dockerfile is as follows:

  1. Install build/run dependencies (e.g. additional OS packages, build tools, web/application servers);
  2. Copy the application source code into the image;
  3. Install application dependencies (and build, if needed);
  4. Purge build dependencies, caches and unused files.

Managing steps #1 and #4 can be a challenging task if you want an efficient Dockerfile. In the end, we just want to be able to run our application, with an image as small as possible. So what if we could discard all previous steps, without needing to “manually” purge the build dependencies from our final image? This is precisely what multi-stage builds aim to tackle.

When to use?

If you’re shipping backend applications written with some interpreted programming languages (e.g. Ruby, Python, PHP, etc.) then most probably multi-stage builds have no additional benefit for you.

However, if you have applications written in compiled programming languages (Go, Rust, Java, etc.) or you’re shipping code that requires static files to be processed before being served, then a multi-stage build is your new best friend.

Compiled Programming Languages

The first use case we’re going to approach is the creation of Docker images for applications written in compiled programming languages. As an example, we’re going to build a Dockerfile for a “Hello World” application written in Go:

// hello.go
package main
import "fmt"
func main() {
fmt.Println("Hello World")
}

Go applications are compiled to self-contained statically linked binaries, free of external dependencies, which means that as soon as we have the compiled binary, the smallest Docker base image is all we need to run it.

Single-Stage Build

Without multi-stage builds, a Dockerfile to deploy our Go “Hello World” application looks like this:

# I am a huge fan of Alpine Linux (https://alpinelinux.org/)
FROM golang:alpine

# Set workdir and copy source files
WORKDIR /src/app
COPY ./hello.go .
# Build app binary
RUN go build -o hello-world
# Set startup command
CMD ["./hello-world"]

So, let’s build, run and inspect the size of our single-stage Docker image:

$ docker build -t hello-world-go .
$ docker run --rm hello-world-go
"Hello World"
$ docker images | grep hello-world-go | awk '{print $NF}'
272MB

Our hello-world-go image has 272MB. Since we have only one stage, we have to use a base image with Go and all its dependencies installed to build and then run our application. The golang:alpine base image has 270MB.

If we inspect the image history, we can see the layers that compose the image and their individual size:

The layers from line 7 to 17 belong to the golang:alpine base image. The remaining layers (line 3 to 6) are the ones we added to the image, one per Dockerfile instruction.

Multi-Stage Build

With a multi-stage build, our Dockerfile looks like this:

#
# Build Stage
#

FROM golang:alpine AS build

# Set workdir and copy source files
WORKDIR /src/app
COPY ./hello.go .
# Build app binary
RUN go build -o hello-world
#
# Final Stage
#

FROM alpine

# Set workdir and copy the binary file from build stage!
WORKDIR /src/app
COPY --from=build /src/app ./
# Set startup command
CMD ["./hello-world"]

Our Dockerfile now has two stages:

  1. Build: This stage does the same as our single-stage Dockerfile (without the CMD instruction). Note how we can name stages, by appending AS <name> to the FROM instruction;
  2. Final: This is the stage that builds our final image. We start by updating the base OS (now all we need is the simplest alpine base image) copy our binary from the previous build stage and define the startup command.

Build, run and inspect the size of our multi-stage Docker image:

$ docker build -t hello-world-go-multi .
$ docker run --rm hello-world-go-multi
"Hello World"
$ docker images | grep hello-world-go-multi | awk '{print $NF}'
5.82MB

Wow, just 5.82MB?

Since we used a two-stage build, our final image requires nothing more than the simplest alpine base image (which has 3.97MB) to run. This is possible because all artefacts/layers from previous stages are left behind and have no impact on the final image (as I said before, think in stages like ephemeral layers).

A saving of 266.18MB compared to our single-stage Dockerfile!

If we inspect the image history, we can see the layers that compose the image and their individual size:

The layers from line 3 to 5 are the ones we added with our Dockerfile. CMD and WORKDIR instructions cost nothing. Thus the only thing incrementing the base image size is the binary copied from build stage.

Single-Page Applications (SPAs)

SPAs are another great use case for multi-stage builds. A SPA source code is composed of a set of JavaScript, CSS and HTML files that must be bundled together into static files. The resulting static files can then be served by a standard web server like Nginx.

To demonstrate the use of single-stage and multi-stage builds in the context of SPAs, we are going to create a React “Hello Word” SPA (the following instructions require Node.js and npm installed):

$ npm install -g create-react-app 
$ create-react-app hello-world-react
$ cd hello-world-react

Now we have a React “Hello Word” SPA ready to be deployed. Let’s see how we can do it with both single-stage and multi-stage approaches.

Single-Stage Build

A single-stage Dockerfile for a React SPA looks like follows:

FROM nginx:alpine

# Install Node.js (including npm)
RUN apk --no-cache add nodejs
# Set workdir and copy source files
WORKDIR /src/app
COPY . .
# Install app dependencies and build static files
RUN npm install --production \
&& npm run build --production
# Copy static files to the Nginx directory and remove source files
RUN cp -r ./build/** /usr/share/nginx/html \
&& rm -rf /src/app
# Set workdir to the Nginx directory
WORKDIR /usr/share/nginx/html
# Remove Node.js
RUN apk del nodejs

Note: We need to exclude the node_modules folder from the files that we’ll copy into the image. For this reason, we need to create a .dockerignore file in the root of the project directory as follows:

$ echo 'node_modules' > .dockeringore 
$ cat .dockeringore node_modules

Let’s build, run and inspect the size of our single-stage Docker image:

$ docker build -t hello-world-react . 
$ docker run -d -p 8181:80 hello-world-react # open localhost:8181
$ docker images | grep hello-world-react | awk '{print $NF}'
229MB

Our hello-world-react image has 229MB. The nginx:alpine base image has 15.5MB.

Now we can inspect the individual layers:

The instruction from the Dockerfile line 11 (install npm dependencies and build) has a significant impact on the resulting image, with the corresponding layer f56be85e0f32 weighting 183MB.

Multi-Stage Build

With a multi-stage build, our Dockerfile looks like:

#
# Build stage
#

FROM node:alpine AS build

# Set workdir and copy source files
WORKDIR /src/app
COPY . .
# Install app dependencies and build static files
RUN npm install --production \
&& npm run build --production
#
# Final stage
#

FROM nginx:alpine

# Set workdir and copy static files from build stage!
WORKDIR /usr/share/nginx/html
COPY --from=build /src/app/build/** ./

Now our Dockerfile has two stages:

  1. Build: In this stage we use the node:alpine base image, update the OS, copy the source code and build our static files;
  2. Final: This is the stage that will build our final image. We start by updating the base OS (now we can use the nginx:alpine base image because the static files are ready to be served) and copy our static files from the previous build stage to the Nginx directory.

Build, run and inspect the size of our multi-stage Docker image:

$ docker build -t hello-world-react-multi . 
$ docker run -d -p 8181:80 hello-world-react-multi # localhost:8181
$ docker images | grep hello-world-react-multi | awk '{print $NF}' 17.5MB

That’s it, only 17.5MB! Since we used a two-stage build, our final image requires nothing more than the simple nginx:alpine base image (which has 15.5MB) to run.

A saving of 211.5MB compared to our single-stage Dockerfile!

Let’s take a look at the individual layers:

As expected, the only thing incrementing the base image size is the static assets copied from build stage, with the corresponding layer 24670e7da23a weighting 1.95MB.

Conclusions

Multi-stage builds are a must have for applications written in compiled programming languages. Here we have seen an example for Go, but the same applies to other compiled languages too.

For example, for a Java application, you would use a base image with Maven to compile and build a WAR file (build stage) and then a base image with just the application server (e.g. Tomcat, Jetty, etc.) to serve your application (final stage).

As for applications written in compiled programming languages, multi-stage builds are also a must-have for SPAs. The same applies to any other kind of applications that require static files processing (e.g. static site generators like Hugo and Jekyll).

Edit: This post was featured in the official Docker Weekly #202 Newsletter.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.