[Docker] Utilize Docker cache to speed up image building process
Introduction
Since Docker images are frequently rebuilt, this article aims to share insights from Docker’s documentation on optimizing the image building process.
Understanding Docker Image Layers
Consider the image as a stack where each instruction in the Dockerfile contributes to a layer in the image. If a layer changes, even if it’s unrelated to the layers above, they still need to be rebuilt. For example, FROM ubuntu:latest
serves as the bottom of the stack, and the top pointer indicates RUN make build
.
Images and Layers
As mentioned, each instruction in a Dockerfile builds a layer. These layers are read-only. When you launch an image as a container, a writable layer is created on top of the image’s layers. When the container is deleted, the writable layer is also removed. Therefore, with the fundamental concepts of how Docker works in mind, let’s move on to how to make use of the layer cache mechanism.
Docker Layer Caching
Caching mechanism mainly works with RUN
, COPY
and ADD
commands.
The RUN Command
CMD
is used when the container is launched, whileRUN
is used during the build step to construct the layer.
If nothing has modified in the package.json
and package-lock.json
, then Docker will reuse the cached layer. This significantly speeds up the build process because you don’t need to reinstall the dependencies unless there’s a change in the package files.
FROM node:14
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["npm", "start"]
The COPY Command
COPY command allows you to import files into Docker image
If no files are changed in the COPY
command, then the layer cache will be reused until the next COPY
or ADD
command in the Dockerfile. Therefore, steps that change more frequently should be placed near the end of the Dockerfile, as shown in the example provided below under the "Reorder Your Layers" section.
The ADD command
ADD can also handle URLs and remote files
Similar to the COPY
command, you should carefully order the steps of your Dockerfile.
Methods
Reorder Your Layers
Consider two scenarios that produce identical results. However, beneath the surface, a difference exists.
- The first scenario:
# syntax=docker/dockerfile:1
FROM node
WORKDIR /app
COPY . . # Copy over all files in the current directory
RUN npm install # Install dependencies
RUN npm build # Run build
- The second scenario:
FROM node
WORKDIR /app
COPY package.json yarn.lock . # Copy package management files
RUN npm install # Install dependencies
COPY . . # Copy over project files
RUN npm build # Run build
The first example is inefficient. For instance, if you modify the project code without altering the dependencies, the process will start from the COPY . .
instruction in Dockerfile. Consequently, dependencies will be re-installed every time. To mitigate this, place COPY package.json yarn.lock
and RUN npm install
before COPY . .
then when the project code is modified, no need to reinstall the dependencies. This way, even when the frequently changing part is the project code, dependency installation is avoided.
Keep Layers Smaller:
Include only what’s necessary in your images. For instance:
- Instead of copying all files and folders using
COPY . .
, carefully select the essential files to include. Avoid including unnecessary files or directories in the root directory. - Utilize
.dockerignore
similar to.gitignore
such not include it into version control, and in dockerignore is meant to ignore during the build image process, such to exclude large files or directories, or even with sensitive issues accidentally. The example below exclude directories whose names start withlogs
in any immediate subdirectory of the root. For exmple/somedir/logs-db/temp.txt
# comment
*/logs/
Use multi-stage builds
Consider this Dockerfile for a GO application.
- Multi-stage Dockerfile
Only includes the executable binary.
# Build Stage
FROM golang:1.17 AS build
WORKDIR /app
COPY . .
RUN go build -o myapp
# Final Stage
FROM alpine
WORKDIR /app
COPY --from=build /app/myapp .
CMD ["./myapp"]
- Single-stage Dockerfile
Both build and runtime dependencies are incorporated.
FROM golang:1.17
WORKDIR /app
# Copy the application source code
COPY . .
# Build the application
RUN go build -o myapp
# Set up the runtime environment
CMD ["./myapp"]
- Add
AS
to name your stage
Adding AS
to name your stages is recommended. Stages are indexed, starting at 0. Using AS
allows naming, which is helpful when stages are reordered. If the name is used, the COPY
instruction in the final stage isn't affected.
- Stop at specific build stage
In the previous example, this command will stop the build process at the first stage. This intermediate image can be used for various purposes, including debugging or testing specific stages.
docker build --target build -t YOUR_IMAGE_NAME .
Conclusions
Given the frequent nature of Docker image builds, it’s essential to optimize the process to ensure efficiency. Two key strategies stand out: reordering your Dockerfile and utilizing multi-stage builds. By strategically placing instructions and dependencies, you can prevent unnecessary rebuilds and expedite the image creation process. Embracing multi-stage builds further enhances efficiency by producing smaller, more streamlined images.
Source
https://docs.docker.com/build/building/multi-stage/
https://docs.docker.com/build/cache/#how-does-the-build-cache-work
https://docs.semaphoreci.com/ci-cd-environment/docker-layer-caching/