[Docker] Utilize Docker cache to speed up image building process

Tiffany Hsu
4 min readAug 31, 2023

--

Introduction

Since Docker images are frequently rebuilt, this article aims to share insights from Docker’s documentation on optimizing the image building process.

Understanding Docker Image Layers

Consider the image as a stack where each instruction in the Dockerfile contributes to a layer in the image. If a layer changes, even if it’s unrelated to the layers above, they still need to be rebuilt. For example, FROM ubuntu:latest serves as the bottom of the stack, and the top pointer indicates RUN make build.

Images and Layers

As mentioned, each instruction in a Dockerfile builds a layer. These layers are read-only. When you launch an image as a container, a writable layer is created on top of the image’s layers. When the container is deleted, the writable layer is also removed. Therefore, with the fundamental concepts of how Docker works in mind, let’s move on to how to make use of the layer cache mechanism.

Docker Layer Caching

Caching mechanism mainly works with RUN , COPY and ADD commands.

The RUN Command

CMD is used when the container is launched, while RUN is used during the build step to construct the layer.

If nothing has modified in the package.json and package-lock.json , then Docker will reuse the cached layer. This significantly speeds up the build process because you don’t need to reinstall the dependencies unless there’s a change in the package files.

FROM node:14

WORKDIR /app

COPY package*.json ./

RUN npm install

COPY . .

CMD ["npm", "start"]

The COPY Command

COPY command allows you to import files into Docker image

If no files are changed in the COPY command, then the layer cache will be reused until the next COPY or ADD command in the Dockerfile. Therefore, steps that change more frequently should be placed near the end of the Dockerfile, as shown in the example provided below under the "Reorder Your Layers" section.

The ADD command

ADD can also handle URLs and remote files

Similar to the COPY command, you should carefully order the steps of your Dockerfile.

Methods

Reorder Your Layers

Consider two scenarios that produce identical results. However, beneath the surface, a difference exists.

  • The first scenario:
# syntax=docker/dockerfile:1
FROM node
WORKDIR /app
COPY . . # Copy over all files in the current directory
RUN npm install # Install dependencies
RUN npm build # Run build
  • The second scenario:
FROM node
WORKDIR /app
COPY package.json yarn.lock . # Copy package management files
RUN npm install # Install dependencies
COPY . . # Copy over project files
RUN npm build # Run build

The first example is inefficient. For instance, if you modify the project code without altering the dependencies, the process will start from the COPY . .instruction in Dockerfile. Consequently, dependencies will be re-installed every time. To mitigate this, place COPY package.json yarn.lock and RUN npm install before COPY . . then when the project code is modified, no need to reinstall the dependencies. This way, even when the frequently changing part is the project code, dependency installation is avoided.

Keep Layers Smaller:

Include only what’s necessary in your images. For instance:

  • Instead of copying all files and folders using COPY . ., carefully select the essential files to include. Avoid including unnecessary files or directories in the root directory.
  • Utilize .dockerignore similar to .gitignoresuch not include it into version control, and in dockerignore is meant to ignore during the build image process, such to exclude large files or directories, or even with sensitive issues accidentally. The example below exclude directories whose names start with logs in any immediate subdirectory of the root. For exmple /somedir/logs-db/temp.txt
# comment
*/logs/

Use multi-stage builds

Consider this Dockerfile for a GO application.

  • Multi-stage Dockerfile

Only includes the executable binary.

# Build Stage
FROM golang:1.17 AS build
WORKDIR /app
COPY . .
RUN go build -o myapp

# Final Stage
FROM alpine
WORKDIR /app
COPY --from=build /app/myapp .
CMD ["./myapp"]
  • Single-stage Dockerfile

Both build and runtime dependencies are incorporated.

FROM golang:1.17

WORKDIR /app

# Copy the application source code
COPY . .

# Build the application
RUN go build -o myapp

# Set up the runtime environment
CMD ["./myapp"]
  • Add AS to name your stage

Adding AS to name your stages is recommended. Stages are indexed, starting at 0. Using AS allows naming, which is helpful when stages are reordered. If the name is used, the COPY instruction in the final stage isn't affected.

  • Stop at specific build stage

In the previous example, this command will stop the build process at the first stage. This intermediate image can be used for various purposes, including debugging or testing specific stages.

docker build --target build -t YOUR_IMAGE_NAME .

Conclusions

Given the frequent nature of Docker image builds, it’s essential to optimize the process to ensure efficiency. Two key strategies stand out: reordering your Dockerfile and utilizing multi-stage builds. By strategically placing instructions and dependencies, you can prevent unnecessary rebuilds and expedite the image creation process. Embracing multi-stage builds further enhances efficiency by producing smaller, more streamlined images.

Source

https://docs.docker.com/build/building/multi-stage/

https://docs.docker.com/build/cache/#how-does-the-build-cache-work

https://docs.semaphoreci.com/ci-cd-environment/docker-layer-caching/

--

--