How to build the perfect Docker image — Part I — Keep it lightweight

Published in

Fernando Pereiro

7 min readJul 25, 2019

Now that I’m preparing the CKA certification (Certified Kubernetes Administrator) I was refreshing my Docker knowledge and I decided to get the Docker Associate Certification (passed!) and write this article.

I’ll probably write an article about it but getting certified is for me the perfect way to challenge myself and push me to study, investigate and play with tech.

At first, this was a short article, but I started to add things that I could not ignore, so to do it in the best possible way and talk about everything that I consider necessary, I have turned it into a series of two articles about how to build the perfect docker image.

In order to have a perfect Docker image we will make it lightweight and secure. In this first article we will talk about how to keep it lightweight.

Why is it important to optimize the image size?

Size optimization have many benefits such as:

Faster CICD pipeline: As long as you use smaller pieces inside your CICD pipeline you will have faster build, test and publish steps.
More portable images: Of course, if you have a heavy image it will take time to go from point A to point B, but in addition, heavy images may reach limits on some cloud services so you won’t be able to execute them.
Cheaper execution: If you are using cloud services this one is very important, you will pay more for a heavier and complex container.
Reduce attack surface: This one is related with the second article, but basically, as smaller the image is the bad people have less ways to hurt your container.

Use a lightweight base image

To build a lightweight image you need to use a lightweight base image for it, basically there is no better image for this purpose that the scratch image, it is an empty image for even creating your own base images (it is the ancestor for all other images); scratch image is perfect! it is light and secure! but is tricky as you can only use it for executables without dependencies (self-contained executable with no need of any additional libraries to execute), anything that you need beyond that needs to be implemented.

An example: the hello-world image.

As you can see, the hello-world image has just on layer, but how is that possible? I mean, there is a base image and an executable! Well that’s because the scratch image is so light that it doesn’t reflect a layer in your image.

This is great but you will not be able to use scratch image all the time, just because you can’t or even don’t want to; don’t worry there are more lightweight base images that you can use such Alpine, just be sure that you are using the lightest for your needs:

Resume steps and layers

Almost everything (FROM, RUN, ADD, COPY, CMD) that you do inside your Dockerfile creates a layer in your image and a layer is more than the code or package that you put inside, that’s way getting together some commands into a single step can help to optimize your image size. Let’s see it with an example.

Here is a non optimized image:

And here is the same image but with some commands in a single RUN statement:

The are the same, right? But we have fewer layers!

And a lighter image!

Multistage Builds

Multistage Builds is the way we have to create an image with only the result of the image building process; if we compare it with a software application Multistage Builds is the way to get the artifact and use it instead that all the code.

How we do it? We write multiple stages using multiple FORM statements in a single Dockerfile (actually you can use several Dockerfiles); we use some FORM statements to build the image and another ones to copy the final result into a new and lightweight image.

For this example we are going to use this Katakoda environment so you can play with it too; It’s really easy, just follow the steps to create a new image with a Dockerfile using Multistage Builds and in the end run this extra command:

docker build -t golang-app-singlestage .

And then:

docker image ls

Now you can compare both methods: Singlestage Build and Multistage Build. Awesome!

It is a big size difference so now you know that Multistage Build is a good friend :)

.dockerignore

To talk about .dockerignore we need to talk first about the Docker build context: When we are building a new image we are doing it inside a context (group of files and packages that we need to create the image), we specify the context using a PATH or URL and then we add files inside de Dockerfile with the ADD and COPY statements, then the Docker client send it to the Docker Server (Docker daemon)(don’t forget Docker is a client-server application) in order to execute the docker build command.

.dockerignore is a file (similar to .gitignore file) that can be placed within the build context directory and is used to say to Docker the files and folders that we don’t want to be part of the context for size and security reasons. Basically the Docker client look at the .dockerignore file to know if there are files or folders to exclude of the context and not send them to the Docker daemon.

Here are some examples (.dockerignore pattern matching syntax is based on Go filepath.Match()):

# ignore .git folder
.git# ignore all *.secret files in all folders, including build root**/*.secret

And here you have the .dockerignore file from the moby project.

Squash

Squash is an option of the docker build command and it squash (or merge) all filesystem layers into a single one. Let’s see how it works!

First you will need to have a Docker daemon with the experimental features enabled because squash option is still experimental.

Just to change a little bit we are going to do it inside the Docker Desktop preferences:

But you can always do it changing your Docker daemon configuration file:

{
  "experimental" : true,
  "debug" : true
}

If we build the non optimized Docker image from the Resume steps and layers section but including the squash option:

We will get the same image size than the optimized Docker image from the Resume steps and layers section:

Wait, what? why? Well, basically we did the same with two different ways: we are merging what we are doing in the RUN commands, in the Resume steps and layers section we chose what to merge and using the squash option everything is merged.

It’s probably that you never heard before about the squash option, why?

It is experimental and not everybody likes to use experimental features, in fact, it’s probably that you can’t enable the experimental features in your workplace.
It have a dark past with many malfunctions.
It creates a new layer merging the previous ones so it is difficult to use that layer with other images; using a single layer with several images is part of the Docker soul and using the squash command we are kind of losing that, with the Resume steps and layers tip we are not losing it because we are choosing the layers to merge.

Conclusions

Optimization is always a must and with Docker it is very easy to create optimized images; here I told you some tips but there are more of them! Do you have your own tip? Please! Talk me about it :)

What’s next

In the second article we will make a secure Docker image with:

Secrets.
Image Scanning.
Image signing.
Much more…