Super-Slim Docker Containers
A guide to reducing the size of your Docker images
Have you ever wondered why your single-app Docker container grows to 400 MB? Or maybe why a single app binary of a few tens of MBs results in a multi-MB Docker image?
In this piece, we’ll review some of the major factors contributing to the fattening up of your containers, as well as best practices and tips to end up with super-slim Docker containers for your project.
Docker Image Layers
A Docker container image is essentially piled-up files to be instantiated later on as a running container. Docker utilises the Union File System (UnionFS) design, in which files are grouped together in layers. Each layer may contain one or more files, and every layer is positioned on top of the previous layer. It is the virtual runtime merge of all the content of all layers that, as end users, we experience as a unified file system:
The final file system view presented to us by the underlying implementation of UnionFS (Docker supports quite a few different ones via pluggable storage drivers) has the total size of all the layers it comprises. When Docker creates a container for an image, it uses all the image’s layers in a read-only format, adding a thin read-write layer on top of them. This thin read-write layer is what allows us to actually modify files in a running Docker container:
What happens if a file is deleted in Layer 4 above? Although the deleted file won’t appear in the observed file system anymore, the size it originally occupied will still be part of the container’s footprint as the file was included in a lower, read-only layer.
It’s relatively easy to start with a small app binary and end up with a fat container image. In the following sections, we’ll explore different methods to keep the size of our images as thin as possible.
Beware of Your Build Path
What’s the most common way in which we build our Docker images?
docker build .
. in the above command tells Docker that we consider the current working folder as the root file system path of the build process.
To better understand what really happens when the above command is issued, we should keep in mind that a Docker build is a client-server process. The Docker CLI (client), where we execute the
docker build command from, uses the underlying Docker engine (server) to build a container image. To restrict access to the underlying file system of the client, the build process needs to know what the virtual file system root is. It is under this exact path that any command in your
Dockerifle tries to find file resources that can potentially end up within the image being built.
Let’s consider for a moment the location where we usually place our
Dockerfile. In the root of the project, maybe? Well, combine a
Dockerfile in the root of the project with a
docker build and we have effectively added the complete project folder as potential file resources for the build. This may result in multiple MBs and thousands of files unnecessarily being added in the build context. If we carelessly define an
COPY command in the
Dockerfile, all those files can be part of the final image. Most of the time, this is not what we need as only a few selected project artefacts should be included in the final container image.
Always check that you provide an appropriate build path to
docker build and that your
Dockerfile doesn’t add unnecessary files to your image. If for any reason you really need to define the root of your project as the build context, you can selectively include/exclude files via
Normalise Your Image Layers
The maximum number of layers an image can have is 127, provided your underlying storage driver supports it. This limit can be increased if really needed, but then you narrow your choices of where this image can be built (i.e., you need a Docker engine running on a similarly modified underlying Kernel).
As discussed in the section on Docker image layers, above, because of UnionFS, whatever file resource goes into a layer stays in the layer even if you
rm that file in a subsequent layer. Let’s see that with a sample
FROM alpineRUN wget http://xcal1.vodafone.co.uk/10MB.zip -P /tmp
RUN rm /tmp/10MB.zip
Build the above image:
And inspect it with dive:
An efficiency of 34% signifies that there is quite a lot of space wasted in our image. This results in longer image fetch times, additional consumed bandwidth, and slower startup times.
How can we get rid of this wasted space?
Have you ever seen a
Dockerfile with an extremely long
RUN directive where multiple shell commands are aggregated with
&&? That’s commands merge.
By merging commands, we essentially create a single layer out of the result of this single long command. Since no intermediate layers exist where files are added and later removed in another layer, the final layer will not use any space for such ghost files. Let’s see that by modifying the above
FROM alpineRUN wget http://xcal1.vodafone.co.uk/10MB.zip -P /tmp && rm /tmp/10MB.zip
Now we have an optimised image:
When you finish building your
Dockerfile, inspect it to see if you can merge commands to reduce possible wasted space.
Squashing the image
An alternative approach to commands merge, especially when using someone else’s
Dockerfile that you don’t want or can’t modify, is to build your image with Docker’s
Unless you’re on a very old Docker version (<1.13), Docker allows us to squash all our layers into a single layer, effectively removing all ghost resources. We can still use the original, unchanged
Dockerfile with the many individual commands, but this time we execute the build passing the
docker build --squash .
The resulting image is, again, 100% optimised:
An interesting point to notice here is that since our
Dockerfile created a layer to add a file and then created another layer to remove that file,
squash is clever enough to realise that no layers need to be created (we only have the
9ccd9… layer from the base image we’re using). Extra kudos to
squash then. However, take into account that squashing your layers may prevent you or the users of your image from taking advantage of previously cached layers.
Note: When working with a third-party
Dockerfile that you don’t want to change, a quick and easy way to minimise any possible wasted space is to build it with
--squash. You may use a tool dive to check the final efficiency of the image.
Very often when we containerise an application, we need to make extra tools, libraries, or utilities available on the image we build by using a package manager such as
Package managers try to save us time and bandwidth when we install packages by caching previously fetched packages. To keep the size of our resulting Docker image as small as possible, we don’t need to keep package manager caches. After all, if we ever need a different image for our containers, we can always rebuild the image with an updated
To delete package manager caches for the three popular package managers above, we can add the following command at the end of our aggregated (i.e., commands merge) command, for example:
APK: ... && rm -rf /etc/apk/cache
YUM: ... && rm -rf /var/cache/yum
APT: ... && rm -rf /var/cache/apt
Note: Before finalising your Docker image, don’t forget to remove any caches that were used during the build as well as any other temporary files that aren’t necessary for your container to run properly.
Choose a Base Image
Dockerfile starts with a
FROM directive. This is where we define the base image upon which our own image will be created.
As noted in the Docker documentation:
“The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. As such, a valid Dockerfile must start with a FROM instruction. The image can be any valid image — it is especially easy to start by pulling an image from the Public Repositories.”
Obviously, there is a ton of different base images to choose from, each one with its own perks and features. Choosing an image which provides just enough of the tools and the environment you need for your application to run is of paramount importance when it comes to the final size of your own Docker image.
The size of different popular base images varies considerably, as you’d expect:
Effectively, containerising your application using an Ubuntu 19.10 base image will add a minimum of 73 MB, whereas the exact same application using an Alpine 3.10.3 base image will only increase the size by an extra 6 MB. As Docker caches image layers, the download/bandwidth penalty is applicable only the first time you’re about to start a container with that image (or simply, when pulling the image). However, the increased size is still there.
At this point, you may have arrived at the following (pretty logical) conclusion: “I will always use Alpine then!”. If only things were that clear in software.
You see, the guys behind Alpine Linux haven’t discovered a special secret sauce that Ubuntu or Debian guys are still looking for. To be able to create a Docker image an order of magnitude smaller than (for instance) Debian, they had to make some decisions regarding what to include and what not to include in their Alpine image. Before choosing Alpine as your default base image, you should check if it provides all the environment you need. Also, even though Alpine comes with a package manager, you may find that a specific package or package version you’re using in your (for instance) Ubuntu-based development environment isn’t available in Alpine. These are tradeoffs you should be aware of and test before you choose the most appropriate base image for your project.
Finally, if you really need to use one of the fatter base images, you could use an image minimisation tool, such as the free and open source DockerSlim, to still reduce the size of your final image.
Note: Choosing an appropriate base image for your own image is important when trying to keep the size down. Evaluate your options and choose an image that provides the tools you need for the size you can afford.
Choose No Base Image at All
If you have an application that can run without any additional environment provided by a base image, you can opt to not use a base image at all. Of course, since
FROM is mandatory in a
Dockerfile, you must still have it and point it to something. What should you use in that case?
Scratch, which is:
“An explicitly empty image, especially for building images “FROM scratch”. This image is most useful in the context of building base images (such as debian and busybox) or super minimal images (that contain only a single binary and whatever it requires, such as hello-world). FROM scratch is a no-op in the Dockerfile, and will not create an extra layer in your image.”
Note: If your application consists of self-contained executables that can operate in a standalone fashion, choosing the
scratch base image allows you to minimise the footprint of your container as much as possible.
Multi-stage builds were the centre of attention back when Docker 17.05 became available. A long-awaited feature, a multi-stage build allows image builders to leave custom image build scripts behind and integrate everything into the well-known
In high-level terms, you can think of a multi-stage build as merging multiple
Dockerfiles together, or simply a
Dockerfile with multiple
Before multi-stage builds, if you wanted to build the artefact of your project and distribute it in a container using a
Dockerfile, you probably had to follow a build process ending up with a container like the one depicted below:
Although there is nothing technically wrong with the above process, the final image and the resulting container are bloated with layers created while building/preparing the project artefact that are not necessary for the project’s runtime environment.
Multi-stage builds allow you to separate the creation/preparation phases from the runtime environment:
You can still have a single
Dockerfile to define your complete build workflow. However, you can copy artefacts from one stage to another while discarding the data in layers you don’t need.
Note: Multi-stage builds allow you to create cross-platform, repeatable builds without using OS-specific, custom build scripts. The final size of your image can be kept to a minimum by selectively including artefacts generated in previous phases of your build.
Creating Docker images for containers is a process that modern software engineers have to deal with very often. There are plenty of online resources and examples showing you how to create a
Dockerfile, however, you should keep an eye on the size of your resulting image.
In this piece, we reviewed several methods and tips to minimise the final size of a Docker image. By carefully crafting a
Dockerfile including only necessary artefacts, choosing an appropriate base image, and using multi-stage builds, the final size of a Docker image can be reduced considerably.