Building Lean Containers: Advanced Techniques for Docker Image Optimization

Kartheek Gottipati
3 min readJul 11, 2023

--

Docker has revolutionized how we develop and deploy applications. It provides a standardized way to package an application along with its runtime dependencies so that it can run anywhere, regardless of the environment. However, as with any tool, it can be used effectively or ineffectively. Large Docker images can consume significant storage and network resources, slow down deployments, and increase security risks. This blog post explores the optimization of Docker images for faster, leaner deployments.

Note: This guide assumes that you are familiar with Docker and Dockerfile basics.

Start with the Right Base Image

The base image is the foundation on which your Docker image is built. It’s crucial to choose a small, secure base image. A minimal base image like alpine is a good place to start as it's only about 5MB.

FROM alpine:latest

The trade-off is that alpine images use musl libc instead of glibc, which can cause compatibility issues with certain software. If you need a glibc-based image, consider using debian:slim or ubuntu:slim instead.

Remember to tag your base images appropriately to avoid breaking your build when the latest tag changes.

Multi-Stage Builds

Multi-stage builds are a Docker feature that help you reduce the size of your final Docker image by discarding unnecessary artifacts and dependencies. Let’s see this in action with a Python application:

# Stage 1: build the application
FROM python:3.9-slim as builder
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
RUN python -m unittest discover

# Stage 2: create a clean image
FROM python:3.9-slim
COPY --from=builder /app /app

WORKDIR /app
CMD ["python", "app.py"]

In this multi-stage build process, the final Docker image only contains the Python runtime, the application, and its necessary dependencies, omitting build tools and test artifacts.

Consolidating Docker Layers

Every command in a Dockerfile contributes to a new layer in the image. Although Docker tries to reuse layers when possible, combining multiple commands into one can significantly cut down the layer count.

Instead of having separate RUN commands as shown below:

RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y vim
RUN apt-get clean

You can consolidate these commands into a single layer:

RUN apt-get update && apt-get install -y \
curl \
vim \
&& apt-get clean

Cleaning Up Post-Installation

Package managers often leave behind cache files, which unnecessarily inflate the Docker image size. You can remove these files within the same RUN command:

RUN apt-get update && apt-get install -y \
curl \
vim \
&& rm -rf /var/lib/apt/lists/*

Here, rm -rf /var/lib/apt/lists/* gets rid of APT's package cache.

Exclude Unnecessary Files

A .dockerignore file helps to prevent unwanted files from being added to your Docker image. The format of .dockerignore is similar to .gitignore.

Here is a sample .dockerignore file:

.git
Dockerfile
README.md
*.log

Making the Most of Docker’s Build Cache

Docker attempts to cache intermediate layers to speed up subsequent builds. But for a cache to be reusable, all previous steps need to be identical. Hence, it’s advisable to order Dockerfile commands from the least frequently changed to the most frequently changed.

# These steps don't change often
FROM node:14
WORKDIR /app
COPY package.json yarn.lock ./

# Install dependencies
RUN yarn install

# These steps change frequently
COPY . .
RUN yarn build

In this case, code changes won’t invalidate the yarn install cache, improving the build time.

Conclusion

Optimizing Docker images is a crucial aspect of leveraging Docker’s full potential. By implementing the strategies outlined in this guide — selecting the correct base image, using multi-stage builds, minimizing Docker layers, cleaning up post-installation, excluding unnecessary files, and effectively using Docker’s build cache — you can significantly reduce the size and build time of your Docker images.

Optimized Docker images lead to faster, more secure deployments, reduced resource consumption, and a more efficient development workflow. Remember, creating efficient Docker images isn’t a one-off task but an ongoing practice that you should embed in your CI/CD pipeline. Keep refining, and enjoy your journey with Docker!

--

--