Super-Slim Docker Containers

A guide to reducing the size of your Docker images

Nassos Michas
Jan 14 · 10 min read
Photo by William Warby on Unsplash

Have you ever wondered why your single-app Docker container grows to 400 MB? Or maybe why a single app binary of a few tens of MBs results in a multi-MB Docker image?

In this piece, we’ll review some of the major factors contributing to the fattening up of your containers, as well as best practices and tips to end up with super-slim Docker containers for your project.


Docker Image Layers

A simplified view of UnionFS (Image by the author)

The final file system view presented to us by the underlying implementation of UnionFS (Docker supports quite a few different ones via pluggable storage drivers) has the total size of all the layers it comprises. When Docker creates a container for an image, it uses all the image’s layers in a read-only format, adding a thin read-write layer on top of them. This thin read-write layer is what allows us to actually modify files in a running Docker container:

A running container adds a read-write layer on top of an image’s read-only layers. (Image by the author)

What happens if a file is deleted in Layer 4 above? Although the deleted file won’t appear in the observed file system anymore, the size it originally occupied will still be part of the container’s footprint as the file was included in a lower, read-only layer.

It’s relatively easy to start with a small app binary and end up with a fat container image. In the following sections, we’ll explore different methods to keep the size of our images as thin as possible.


Beware of Your Build Path

docker build .

The . in the above command tells Docker that we consider the current working folder as the root file system path of the build process.

To better understand what really happens when the above command is issued, we should keep in mind that a Docker build is a client-server process. The Docker CLI (client), where we execute the docker build command from, uses the underlying Docker engine (server) to build a container image. To restrict access to the underlying file system of the client, the build process needs to know what the virtual file system root is. It is under this exact path that any command in your Dockerifle tries to find file resources that can potentially end up within the image being built.

Let’s consider for a moment the location where we usually place our Dockerfile. In the root of the project, maybe? Well, combine a Dockerfile in the root of the project with a docker build and we have effectively added the complete project folder as potential file resources for the build. This may result in multiple MBs and thousands of files unnecessarily being added in the build context. If we carelessly define an ADD/COPY command in the Dockerfile, all those files can be part of the final image. Most of the time, this is not what we need as only a few selected project artefacts should be included in the final container image.

Always check that you provide an appropriate build path to docker build and that your Dockerfile doesn’t add unnecessary files to your image. If for any reason you really need to define the root of your project as the build context, you can selectively include/exclude files via .dockerignore.


Normalise Your Image Layers

As discussed in the section on Docker image layers, above, because of UnionFS, whatever file resource goes into a layer stays in the layer even if you rm that file in a subsequent layer. Let’s see that with a sample Dockerfile:

FROM alpineRUN wget http://xcal1.vodafone.co.uk/10MB.zip -P /tmp
RUN rm /tmp/10MB.zip

Build the above image:

Building a sample image with wasted space (Image by the author)

And inspect it with dive:

Image is only 34% efficient (Image by the author)

An efficiency of 34% signifies that there is quite a lot of space wasted in our image. This results in longer image fetch times, additional consumed bandwidth, and slower startup times.

How can we get rid of this wasted space?

Commands merge

By merging commands, we essentially create a single layer out of the result of this single long command. Since no intermediate layers exist where files are added and later removed in another layer, the final layer will not use any space for such ghost files. Let’s see that by modifying the above Dockerfile:

FROM alpineRUN wget http://xcal1.vodafone.co.uk/10MB.zip -P /tmp && rm /tmp/10MB.zip

Now we have an optimised image:

A 100% optimised image with commands merge (Image by the author)

When you finish building your Dockerfile, inspect it to see if you can merge commands to reduce possible wasted space.

Squashing the image

Unless you’re on a very old Docker version (<1.13), Docker allows us to squash all our layers into a single layer, effectively removing all ghost resources. We can still use the original, unchangedDockerfile with the many individual commands, but this time we execute the build passing the --sqash option:

docker build --squash .

The resulting image is, again, 100% optimised:

A 100% optimised image with image squash (Image by the author)

An interesting point to notice here is that since our Dockerfile created a layer to add a file and then created another layer to remove that file, squash is clever enough to realise that no layers need to be created (we only have the 9ccd9… layer from the base image we’re using). Extra kudos to squash then. However, take into account that squashing your layers may prevent you or the users of your image from taking advantage of previously cached layers.

Note: When working with a third-party Dockerfile that you don’t want to change, a quick and easy way to minimise any possible wasted space is to build it with --squash. You may use a tool dive to check the final efficiency of the image.


Delete Caches

Package managers try to save us time and bandwidth when we install packages by caching previously fetched packages. To keep the size of our resulting Docker image as small as possible, we don’t need to keep package manager caches. After all, if we ever need a different image for our containers, we can always rebuild the image with an updated Dockerfile.

To delete package manager caches for the three popular package managers above, we can add the following command at the end of our aggregated (i.e., commands merge) command, for example:

APK: ... && rm -rf /etc/apk/cache
YUM: ... && rm -rf /var/cache/yum
APT: ... && rm -rf /var/cache/apt

Note: Before finalising your Docker image, don’t forget to remove any caches that were used during the build as well as any other temporary files that aren’t necessary for your container to run properly.


Choose a Base Image

As noted in the Docker documentation:

“The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. As such, a valid Dockerfile must start with a FROM instruction. The image can be any valid image — it is especially easy to start by pulling an image from the Public Repositories.”

Obviously, there is a ton of different base images to choose from, each one with its own perks and features. Choosing an image which provides just enough of the tools and the environment you need for your application to run is of paramount importance when it comes to the final size of your own Docker image.

The size of different popular base images varies considerably, as you’d expect:

Popular Docker base images size (Image by the author)

Effectively, containerising your application using an Ubuntu 19.10 base image will add a minimum of 73 MB, whereas the exact same application using an Alpine 3.10.3 base image will only increase the size by an extra 6 MB. As Docker caches image layers, the download/bandwidth penalty is applicable only the first time you’re about to start a container with that image (or simply, when pulling the image). However, the increased size is still there.

At this point, you may have arrived at the following (pretty logical) conclusion: “I will always use Alpine then!”. If only things were that clear in software.

You see, the guys behind Alpine Linux haven’t discovered a special secret sauce that Ubuntu or Debian guys are still looking for. To be able to create a Docker image an order of magnitude smaller than (for instance) Debian, they had to make some decisions regarding what to include and what not to include in their Alpine image. Before choosing Alpine as your default base image, you should check if it provides all the environment you need. Also, even though Alpine comes with a package manager, you may find that a specific package or package version you’re using in your (for instance) Ubuntu-based development environment isn’t available in Alpine. These are tradeoffs you should be aware of and test before you choose the most appropriate base image for your project.

Finally, if you really need to use one of the fatter base images, you could use an image minimisation tool, such as the free and open source DockerSlim, to still reduce the size of your final image.

Note: Choosing an appropriate base image for your own image is important when trying to keep the size down. Evaluate your options and choose an image that provides the tools you need for the size you can afford.


Choose No Base Image at All

Scratch, which is:

“An explicitly empty image, especially for building images “FROM scratch”. This image is most useful in the context of building base images (such as debian and busybox) or super minimal images (that contain only a single binary and whatever it requires, such as hello-world). FROM scratch is a no-op in the Dockerfile, and will not create an extra layer in your image.”

Note: If your application consists of self-contained executables that can operate in a standalone fashion, choosing the scratch base image allows you to minimise the footprint of your container as much as possible.


Multi-Stage Builds

In high-level terms, you can think of a multi-stage build as merging multiple Dockerfiles together, or simply a Dockerfile with multiple FROMs.

Before multi-stage builds, if you wanted to build the artefact of your project and distribute it in a container using a Dockerfile, you probably had to follow a build process ending up with a container like the one depicted below:

Building and distributing your application without multi-stage builds (Image by the author)

Although there is nothing technically wrong with the above process, the final image and the resulting container are bloated with layers created while building/preparing the project artefact that are not necessary for the project’s runtime environment.

Multi-stage builds allow you to separate the creation/preparation phases from the runtime environment:

Multi-stage builds, separation creation/preparation from runtime (image by author)

You can still have a single Dockerfile to define your complete build workflow. However, you can copy artefacts from one stage to another while discarding the data in layers you don’t need.

Note: Multi-stage builds allow you to create cross-platform, repeatable builds without using OS-specific, custom build scripts. The final size of your image can be kept to a minimum by selectively including artefacts generated in previous phases of your build.


Conclusion

In this piece, we reviewed several methods and tips to minimise the final size of a Docker image. By carefully crafting a Dockerfile including only necessary artefacts, choosing an appropriate base image, and using multi-stage builds, the final size of a Docker image can be reduced considerably.

Better Programming

Advice for programmers.

Nassos Michas

Written by

Software engineer | Cert. Scrum master | Cert. Professional for Requirements Engineering | CTO at European Dynamics

Better Programming

Advice for programmers.

More From Medium

More from Better Programming

More from Better Programming

More from Better Programming

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade