How I Cut Docker Image Size by Switching to a Distroless Base Image
Introduction
One of the main challenges of updating multiple Node.js projects, sometimes upgrading from version 14 to 22, was adapting Dockerfiles to ensure optimal compatibility while minimizing image size. This optimization was essential for improving security, reducing vulnerabilities, and speeding up build and deployment times. This article aims to improve this process by adapting the Dockerfiles to reduce the image size, and enhance build and deployment efficiency. I’m going to share with you some of the best practices I’ve applied to achieve these goals.
Understanding Dockerfiles
A Dockerfile is a script containing a set of instructions to build a Docker image. It defines the base image, application dependencies, environment settings, and commands to execute within the container.
Let’s have a look at a Dockerfile. Most Dockerfiles are like this:
FROM node:14.21.2
WORKDIR /var/www
COPY node_modules ./node_modules
COPY build ./build
ENV COGNITO_POOL_ID=xxxx
ENV COGNITO_CLIENT_ID=xxxx
ENV ENVOY_GRPC_JWT_EXT_AUTHZ_PORT=xxxx
CMD ["sh", "-c", "node ./build/index.js"]
Docker Image Layers
Docker images are built in layers, where each command in the Dockerfile creates a new layer. These layers are stacked together to form the final image. Layers allow Docker to efficiently reuse existing image data and optimize storage.
Key aspects of Docker image layers:
- Base Layers: The foundational layers that come from the base image (e.g., node:22-alpine).
- Intermediate Layers: Created from each RUN, COPY, or ADD instruction in the Dockerfile.
- Final Layer: The last layer that combines all previous layers and executes the container’s main process.
Why Optimize Dockerfiles?
Optimizing a Dockerfile is not just about reducing image size. It also helps to:
- Improve security by limiting the attack surface, and reducing the noise of irrelevant CVE inherited from the base image
- Speed up builds and deployments by reducing image weight
- Optimize resource usage (storage, memory, CPU)
- Enhance maintainability with clean and structured images
Analyzing your Image
Find all the vulnerabilities linked to your image with Trivy
Before deploying your image, it’s crucial to check for vulnerabilities. Trivy is a simple and comprehensive vulnerability scanner for container images, file systems, and Git repositories. It scans for known vulnerabilities in both operating system packages and application dependencies. While optimizing an image can sometimes remove certain vulnerabilities by reducing unused packages or layers for example.
To use Trivy, install it and run the following command to scan your image:
trivy image <image_name>
This will provide you with a detailed report on any vulnerabilities found in your Docker image. The report highlights security risks, outdated dependencies, and potential threats. By identifying and addressing these vulnerabilities early on, you can ensure your image is secure and reduce the chances of exploits in production environments.
Take a look at your image layers with Dive
Once the vulnerabilities are addressed, it’s time to optimize the image size. Dive is a tool that helps analyze Docker images by visualizing their layers and sizes. It allows you to see which layers contribute most to the image size and identify areas for optimization.
To use dive, install it and run:
dive <image_name>
This allows you to explore which layers contribute most to the image size and optimize them accordingly.
Some of the Best Practices
Using Distroless Images
Distroless images are minimal container images that do not include a traditional operating system package manager or shell. Instead, they contain only the necessary runtime dependencies for an application to run. The concept was introduced by Google to enhance security, reduce attack surfaces, and optimize performance.
Google introduced the Distroless concept as part of its internal container security practices and later open-sourced it through the gcr.io/distroless project. These images are widely used in Kubernetes workloads and cloud environments where security and efficiency are top priorities.
Using a distroless image significantly reduces the attack surface since it eliminates unnecessary utilities like package managers, shells, and debugging tools, making it harder for attackers to exploit vulnerabilities. This makes distroless images a great choice for production environments where minimizing potential security risks is crucial.
However, while distroless images offer clear advantages in terms of security and performance, they also come with disadvantages. For example, they lack debugging tools and package managers, making troubleshooting harder. Additionally, without a shell or package manager, adding new dependencies or performing updates inside the container can be more complex and require more careful planning during the build process.
Using npm ci
Instead of npm install, using npm ci ensures a clean and deterministic installation of dependencies. This command installs packages exactly as defined in package-lock.json, avoiding version mismatches and potential inconsistencies.
Additionally, npm ci is faster because it skips dependency resolution and directly installs the versions specified in the lock file. This optimization speeds up build times and ensures a reproducible environment.
Multistage build
A multi-stage build is a technique used to create smaller and more efficient Docker images by separating the build environment from the runtime environment. The key advantage is that unnecessary dependencies and build tools are excluded from the final image, reducing its size and enhancing security.
In a multi-stage build, Docker creates a new layer for each stage, and only the final stage will contain the necessary runtime files. The other layers, including those with build tools or development dependencies, are discarded once the build is complete.
# ---- Full Dependency and Build Stage ----
FROM node:22-alpine AS build
WORKDIR /src
COPY package*.json ./
RUN npm ci --ignore-scripts --no-fund
COPY . .
RUN npm run build:ci
# ---- Production Dependencies Stage ----
FROM node:22-alpine AS prod-deps
WORKDIR /src
COPY package*.json ./
RUN npm ci --omit=dev --ignore-scripts --no-fund
In this approach, the first stage (build) compiles the application and prepares everything needed for production. However, it’s in the final stage (prod-deps) where we specifically copy only the required files (such as production dependencies and built artifacts) into the new image. This ensures that the final image is as small and clean as possible, containing only what’s necessary for runtime — without the build dependencies or tools from the earlier stages.
Using Tini or Dumb-Init
By default, Docker containers don’t include an init system, which can lead to zombie processes or improper signal handling in some scenarios. While Docker now provides the — init flag to include a minimal init process automatically, this only works when you control how containers are run (e.g., using docker run — init). In orchestrated environments like Kubernetes, where you don’t always have access to runtime flags, manually including an init system like Tini or dumb-init in your image ensures consistent behavior across all environments.
One of the key advantages of using Tini directly in your image is that it allows you to verify the signature of the binary, ensuring its authenticity and integrity before use. Additionally, Tini is lightweight, requiring just the binary to be extracted in the final image, keeping the image size minimal.
Here’s how you can add Tini to your Dockerfile:
# ---- Tini Stage ----
FROM alpine:latest AS tini
ENV TINI_VERSION=v0.19.0
# Install gnupg to verify the tini signature
RUN apk add --no-cache gnupg
# Add tini binary and signature file
ARG TARGETARCH
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TARGETARCH} /tini
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TARGETARCH}.asc /tini.asc
# Import tini's public key and verify the binary
RUN gpg --batch --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 595E85A6B1B4779EA4DAAEC70B588DFF0527A9B7 \
&& gpg --batch --verify /tini.asc /tini \
&& chmod +x /tini
This method allows you to ensure that Tini is securely fetched and verified before being used. The signature verification process confirms that the binary has not been tampered with. After verification, only the Tini binary is included in the final image, meaning you can discard the unnecessary dependencies (like gnupg and signature files) in the production image, keeping it lightweight and secure.
Running as a Non-Root User
By default, Docker containers run as the root user, which introduces security risks. Running applications as a non-root user significantly improves security by limiting potential exploits and reducing privileges within the container.
In this example, the Dockerfile ensures the application runs as a non-root user, but it also carefully manages which files are copied over to the final image and incorporates Tini to handle process signals correctly.
Here’s how you can implement it:
# ---- Final Production Stage ----
FROM gcr.io/distroless/nodejs22-debian12:nonroot
ENV NODE_ENV=xxxx
ENV COGNITO_POOL_ID=xxxx
ENV COGNITO_CLIENT_ID=xxxx
ENV ENVOY_GRPC_JWT_EXT_AUTHZ_PORT=xxxx
WORKDIR /src
COPY --from=prod-deps /src/node_modules ./node_modules
COPY --from=build /src/build ./build
COPY --from=tini /tini /tini
USER nonroot
ENTRYPOINT ["/tini", "--"]
CMD ["/nodejs/bin/node", "./build/index.js"]
But beyond that, you can take container isolation even further by enabling user namespaces.
User namespaces allow container users (even root inside the container) to be mapped to non-privileged users on the host. This means that even if an attacker escapes the container and gains access as “root” inside it, they will have only limited privileges on the host system.
This feature has actually been supported in Docker for quite some time, and can be enabled through the Docker daemon configuration. However, it’s often disabled by default and requires additional setup, such as managing UID/GID mappings and volume permissions.
Why it matters:
- Adds a security boundary between the container and the host
- Limits the potential impact of container breakout vulnerabilities
- Works well in environments with strict isolation requirements
Kubernetes and User Namespaces
While user namespaces haven’t always been widely used in Kubernetes, support is maturing. Kubernetes is progressively adding support for User Namespaces for Pods, allowing clusters to automatically remap container users to less privileged host users making deployments more secure by default.
To start using this feature, you can:
- Enable user namespaces in your container runtime.
- Configure your Kubernetes cluster (on versions that support this feature) to use PodSecurityContext with user namespace options.
Using a non-root user in the Dockerfile is a great step but enabling user namespaces adds another layer of defense. As Kubernetes evolves, this will become a standard part of secure container configurations.
Results
With all the modifications applied, we were able to achieve significant improvements in both the size and performance of the Docker image. By upgrading from Node.js version 14 to 22 and applying best practices like multi-stage builds and the use of a distroless image, we reduced our image size from a hefty 380 MB to just 60 MB. The key factors behind this optimization were the elimination of unnecessary build dependencies in the final image and the choice of a minimal base image that includes only what’s strictly required at runtime.
This size reduction has several key benefits:
1. Faster Pull Times: With the image now being significantly smaller, the time it takes to pull the container from the registry is much faster. This is particularly beneficial in environments where quick deployment and scalability are crucial.
2. Improved Security and Cleanliness: The new image is cleaner, more secure, and free of unnecessary build dependencies. The use of a non-root user and Tini for signal handling helps ensure that the container operates more securely and predictably.
3. Optimized Build and Deployment: The optimizations also contribute to a more efficient build and deployment pipeline, making the entire process faster and more reliable.
Additionally, just by bumping Node.js from version 14 to version 22, we removed hundreds of vulnerabilities. It’s crucial to regularly update your versions to keep your containers secure, as each new Node.js release addresses known vulnerabilities and improves overall security.
Overall, the shift to a smaller, more efficient image not only improves security and performance but also contributes to a smoother and faster development and deployment experience.
Final Dockerfile
You can find the full Dockerfile in this GitHub Gist.
Ressources
Here are some resources if you are interested in the topic:
I want to sincerely thank Sébastien Boulet and Benjamin DAVY for encouraging me to write this article. Huge thanks as well to Vincent Therry, Gregory Rome Ronan Drouglazet and Fabien Bernard for their invaluable help. I’m also grateful to Hugo Debreyne, Nicolas Terendij, Sébastien Gineste, and Omri Bar-Zik for their insightful advice and feedback. Big shoutout to Lucas for creating the thumbnail! And finally, a very special thank you to Laetitia Bellanger for giving me the opportunity to publish this article.
Thanks for reading!
If you have any feedback or suggestions, feel free to reach out — I’d love to hear your thoughts on what I’ve written.