Speed up CI builds with multi-stage Dockerfiles
Leveraging the Docker build cache has been a great way to speed up Docker builds for years now:
When building an image, Docker steps through the instructions in your
Dockerfile
, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image.For the
ADD
andCOPY
instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file.For the
RUN
instructions, the command string itself is used to find a match.Once the cache is invalidated, all subsequent
Dockerfile
commands generate new images and the cache is not used.
The Dockerfile
for a simple Ruby service may look something like:
FROM alpineWORKDIR /app
RUN apk add --update ruby ruby-bundlerCOPY Gemfile* ./
RUN bundle install --path vendor/bundleCOPY . ./
This works great since the RUN bundle install
step is only executed if:
- new
alpine:latest
changes are pulled in apk
system dependencies change (not very often)- the contents of
Gemfile
orGemfile.lock
change
Otherwise the build cache is used and the step completes immediately:
Step 5/6 : RUN bundle install --path vendor/bundle
---> Using cache
---> 17ec830a9b5b
Nowadays it’s common for Ruby services to use other dependencies as well e.g. Node for handling misc tasks like asset compilation via webpack
:
FROM alpineWORKDIR /app
RUN apk add --update ruby ruby-bundler nodejs nodejs-npmCOPY Gemfile* ./
RUN bundle install --path vendor/bundleCOPY package.json ./
RUN npm installCOPY webpack.config.js ./
COPY app/assets ./app/assets
RUN npm run webpackCOPY . ./
But here’s where things get a little tricky:
- Now whenever the
Gemfile
orGemfile.lock
changes it invalidates all of the cached steps below it - That means all
NPM
packages must be reinstalled and allwebpack
assets must be recompiled even though gems have nothing to do with those steps - The steps could be reordered so that gems are installed after the Node related steps but that results in the same problem where changes to
package.json
orwebpack
assets requires a full reinstallation of gems - It’s pretty common for these Node dependencies to only be used for asset compilation at build time. In these cases Node,
NPM
, and the entirenode_modules
directory isn’t actually needed when the service runs - For smaller services this usually isn’t a big deal but for services that rely on a large number of gem or
NPM
dependencies this results in much slower build times
Now Docker 17.05+
multi-stage builds can be used to solve this problem!
This feature allows a Dockerfile
to contain multiple FROM
steps which can generate intermediate images and utilize the build cache more efficiently.
These intermediate images are eventually combined into one final image:
FROM alpine AS node
WORKDIR /app
RUN apk add --update nodejs nodejs-npmCOPY package.json ./
RUN npm installCOPY webpack.config.js ./
COPY app/assets app/assets
RUN npm run webpack
FROM alpine
WORKDIR /app
RUN apk add --update ruby ruby-bundlerCOPY Gemfile* ./
RUN bundle install --path vendor/bundleCOPY --from=node /app/assets assets
COPY . ./
Now gem changes no longer impact NPM
or webpack
steps and vice-versa!
This was a simple example but these multi-stage builds are very useful for:
- Services that have many dependencies so reinstallation is slow
- Services that have many independent build steps for misc tasks like generating static content e.g. sitemaps, marketing pages, assets, etc.
Note that even though the Ruby and Node related steps no longer invalidate each other’s build cache, all of the intermediate Docker images generated within the Dockerfile
are still built one at a time, step by step! On a fresh build, the Ruby related steps won’t be evaluated until the Node related steps complete, even though they do not depend on each other.
Keep an eye on Buildkit - the concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit! One of its upcoming experimental features allows the intermediate images in multi-stage builds to be compiled in parallel!
Our engineering team is still growing! We’re hiring engineers in our San Francisco and Pittsburgh offices. Check out our careers page to learn more. We look forward to hearing from you!