Look Docker, No Distro

Published in

Nerd For Tech

9 min readJul 20, 2021

This is the second article in the four part series, Minimizing & Securing Docker Images. Check out the other articles in the series:
1. Bigger .dockerignore, Smaller Docker Images
2. Look Docker, No Distro

When working with Docker, (image) size matters. A small image size allows us to spin up and orchestrate dozens to hundreds of containers while keeping the disk size small. A small image size also reduces the attack surface of our containers, protecting the application and data. By reducing the disk size and reducing the attack surface, we save money. Therefore, by reducing Docker image sizes, we reduce costs.

A navy blue container ship with the words “YANG MING” on the side.

This is the second article in a four part series. Let us set some expectations. For this article we are going to assume you are already knowledgeable with .dockerignore. Although we will be using .dockerignore, we will not spend much time focusing on it. If you want an in-depth look at .dockerignore check out my previous article. Instead, we will focus on reducing base image sizes with distroless images, removing unneeded dependencies, and multi-stage builds. We will dive deeper into securing Docker images and Dockerfiles in the next article. For now, simply reducing the image size (and in turn, the attack surface) will get us started in the direction of increasing security.

🦹 Smaller Distros To The Rescue

Pulling a Docker image by specifying a tag version golang:1.16 or by pulling the latest version golang will produce the same result as golang:1.16-buster. As you can see, all three share the image ID of b09f7387a719. Buster is the code name for the latest stable Debian version, 10.9. Unless you specifically need this image, in most cases it is not necessary. A better alternative, Alpine, is a smaller Linux distribution to use as a base image.

The Alpine Golang image weighs in at 302MB, while it’s Debian counterpart comes in as a heavyweight at 2.85x the size. Already we can run and orchestrate 2.85 times more containers, using the same disk space.

Using Snyk, we can scan the Docker images for vulnerabilities.golang:1.16-debian has 164 known vulnerabilities, 14 of those being high severity. In comparison, the golang:1.16-alpine Docker image has 0 known vulnerabilities. However, this is just the start.

The problem is that we are looking at the problem from the wrong angle. If our goal is to reduce the image size why are we trying to find smaller and smaller distros? Any distribution we find is bound to have packages which we do not need. Those packages are taking up disk space and certainly have vulnerabilities of their own. We should instead build the image from the ground up. Much like a classic car enthusiast, who knows exactly what’s in his car, we will build our Docker images knowing exactly what goes into them.

🔧 The Project

The application we will build to showcase our small Docker images will be a simple API that has one route. A GET request to the / route will make an API call to in order to get the user’s IP address, ISP, and location data, including the user’s latitude and longitude to the millionth decimal place. All this data will be returned.

If you want to download the code and try it out yourself, you will need to sign up for a free API key from:

https://geo.ipify.org/

The repositories for each language are linked below:

🐀 Golang
🚮 JavaScript
🐍 Python

🏎️ FROM scratch

Scratch is the most stripped down version of a Docker container. Scratch contains nothing in it except for the executable binary which you add to it. It has no shell, nothing extra.

Golang

Since Golang is a compiled language, we can use a scratch base image in a multi-stage build. We will also build the Golang version of the application in a variety of other ways:

buster single stage build
buster multi-stage build
alpine single stage build
alpine multi-stage build
distroless multi-stage build

Single vs Multi-stage

You have probably seen a Dockerfile that goes something like…

FROM ...
WORKDIR ...
COPY ...
CMD ...

However multi-stage builds allow us to optimize and reduce the size of our images by selectively copying over artifacts from one base image to another. You’ll be able to see some examples below of multi-stage builds, as we analyze the Dockerfiles of the different repos linked above.

The article linked below provides a deep dive into multi-stage builds. I’ve never gone this extreme with my builds but I can see the use cases for it.

Using Multi-Stage Builds to Simplify And Standardize Build Processes

Using Native Dockerfile Tools to Move Information Left and Speed Up Delivery

medium.com

A short excerpt from the article linked above provides a good explanation:

Multi-stage builds allow you to separate build, test, and run time environments needing separate Dockerfiles. They allow you to minimize the actual size of the final Docker container that you deploy, because the various layers are no longer stored in the final container.

Downsides of Scratch

There is one significant drawback to using scratch images; they only work for compiled languages where the code can be compiled into binary and then executed. For many languages, JavaScript and Python included, which are interpreted instead of compiled, a scratch image won’t work. The container of a scratch image would not be able to execute the code since it is not code that is machine readable and therefore needs an interpreter. However, as stated previously, a scratch image doesn’t have an interpreter, it is an empty container.

An Alternative To Scratch

There is hope for shrinking images sizes for programs that are not written in compiled languages. Distroless images came out around 2017. A distroless image is not a single image to solve the problem, like scratch is. Instead, distroless images are a class of minimal images which contain only your application and the application’s runtime dependencies.

The static distroless image, gcr.io/distroless/static, is the simplest of all the distroless images. It contains a minimal Linux, glibc-based system with:

📝 ca-certificates
🔒 A /etc/passwd entry for a root user
🗑️ A /tmp directory
⌚ tzdata

Once again, static images are the simplest distroless images. A step up from the static image, with more added packages, is the base image. (Confusing right?!)

Scratch for Compiled

For compiled languages, such as Golang, distroless containers still offer benefits. Compare a scratch Dockerfile to a distroless Dockerfile.

With the scratch base image, we manually have to add CA certificates. Secondly, we are running the binary as root. There is not even a shell to docker exec into, however rule #1 of Docker security is, never run your application as root.

With a distroless base image, we do not have to manually add the CA certs or worry about running the application as root. The CA certs come with the static distroless image. As for permissions, we just need to select the image with the nonroot tag.

🔩 Building off of Distroless

The base distroless image, gcr.io/distroless/base, contains everything from the static image plus:

🇨 glibc
🌐 libssl
🔑 openssl

Base images are best used for Go apps that required libc/cgo and all other statically-compiled applications that the static image can’t serve.

🚮 JavaScript

To be honest, I rarely use base images myself. Static images usually serve my needs for compiled applications. However, base images are still very important. The Node distroless image, gcr.io/distroless/nodejs:14, contains everything from the base image plus Node version 14 and its dependencies.

Let’s compare some of the Dockerfiles we used to get the results seen above.

Buster

The buster base image Dockerfile is a simple multi-stage build. We copy and install only the production dependencies. We then copy the code. We transfer everything into a new base image, then start the application.

Running snyk container test geo-buster gives the output:

Tested 414 dependencies for known issues, found 334 issues.

That is a lot of vulnerabilities!

Alpine

The alpine base image Dockerfile follows a similar build to the buster image above.

Testing the image for vulnerabilities with Snyk gives the output:

✓ Tested 16 dependencies for known issues, no vulnerable paths found.

Distroless

The distroless (geo-distroless-min) Dockerfile is where things start to get interesting. We use alpine as our first base image. We update the packages, add curl and install node-prune. Then, we copy and install the all the dependencies. Next we copy the code. We use npm and webpack to build a minified and uglified version of our code then use npm prune --production, to remove all the dev dependencies. Next, node-prune comes in and helps us reduce our size even more. We then copy over the build and the node-modules to a distroless image and run the application.

What is node-prune?

node-prune is a small tool to prune unnecessary files from ./node_modules, such as markdown, typescript source files, and so on.

How We Reduce Node Docker Image Size In 3 Steps

Dockerizing an application is simple. There are lots of documentation, tutorials, and examples available for almost all…

medium.com

Testing for vulnerabilities we find the following output:

Tested 9 dependencies for known issues, found 25 issues.

🏔️ But What About Alpine?

In the previous section we saw that the distroless build has 25 vulnerabilities while the Alpine build has none. Doesn’t this mean that it is better to use Alpine? Let’s take a look at the tradeoffs.

The vulnerabilities in the distroless image come from:

23 - glibc/libc6
 2 - openssl/libssl
 1 - gcc-8/libgcc1

These are not vulnerabilities from our application but from the distroless base image itself. Although these vulnerabilities are not ideal, using a distroless image over an Alpine image offers some advantages. Alpine comes with a package manager, apk, and a shell, ash. With the distroless image, any bad actors will not be able to docker exec into the container and/or install new packages. There is a trade-off to be made, however, people (this includes you!) not being able to get into the container does pose a big advantage.

🐍 Python

Similar to the Node distroless image, the Python distroless image, gcr.io/distroless/python3:nonroot, is built off of the base image but with Python 3 and it’s dependencies.

Buster

The Buster Dockerfile is straightforward. We copy the requirements.txt, and install them along with upgrading pip. Next we copy the code. Then we copy everything to a new buster base image and run the app.

Buster has the most vulnerabilities. When running Snyk, the output is:

Tested 431 dependencies for known issues, found 349 issues.

Alpine

The Alpine Dockerfile is similar to the Buster Dockerfile except with a different base image.

Running snyk container test geo-alpine returns:

✓ Tested 37 dependencies for known issues, no vulnerable paths found.

Distroless

The distroless Dockerfile acts a bit different than the two above. We use apt to install binutils, and pip to install pyinstaller along with the requirements. Next we copy the code and run pyinstaller on app.py. We copy the newly created dist folder to a new distroless image and run the app.

Pyinstaller is a handy tool.

PyInstaller freezes (packages) Python applications into stand-alone executables…
PyInstaller’s main advantages over similar tools are that PyInstaller works with Python 3.5–3.9, it builds smaller executables thanks to transparent compression, it is fully multi-platform, and use the OS support to load the dynamic libraries, thus ensuring full compatibility.

PyInstaller Quickstart - PyInstaller bundles Python applications

Help keeping PyInstaller alive: Maintaining PyInstaller is a huge amount of work. PyInstaller development can only…

www.pyinstaller.org

Tested 25 dependencies for known issues, found 38 issues.

📦 Containerizing It

Is distroless the true and final savior? It’s good but it has some trade-offs. Compared to Alpine images, distroless images come with several vulnerabilities and a slightly bigger image size. Distroless takes the upper-hand when it comes to having no package manager or shell. Each application and its requirements will need to be understood and analyzed before choosing which base image best suits the need of the project.

Now that you understand distroless, where it shines and where it falls short, you now have the information you need to move forward and create your own secure and minimal Docker images.

📚 Additional Resources

Distroless Containers: Hype or True Value? | Hacker Noon

This article describes one of the latest trends in the container world - it's called distroless containers. Containers…

hackernoon.com

Docker CMD vs ENTRYPOINT: What's The Difference & How To Choose

In a cloud native setup, Docker containers are essential elements that ensure an application runs effectively across…

www.bmc.com

Distroless is for Security if not for Size

If you are not familiar with Distroless, its container image built by google which is basically docker image minus…

dwdraju.medium.com

How to Harden Your Containers With Distroless Docker Images

Use distroless images to secure your containers on Kubernetes

betterprogramming.pub

grycap/minicon

When you run containers (e.g. in Docker), you usually run a system that has a whole Operating System, documentation…

github.com

Use multi-stage builds

Multistage builds are useful to anyone who has struggled to optimize Dockerfiles while keeping them easy to read and…

docs.docker.com

Look Docker, No Distro

🦹 Smaller Distros To The Rescue

🔧 The Project

🏎️ FROM scratch

Golang

Single vs Multi-stage

Using Multi-Stage Builds to Simplify And Standardize Build Processes

Using Native Dockerfile Tools to Move Information Left and Speed Up Delivery

Downsides of Scratch

An Alternative To Scratch

Scratch for Compiled

🔩 Building off of Distroless

🚮 JavaScript

Buster

Alpine

Distroless

How We Reduce Node Docker Image Size In 3 Steps

Dockerizing an application is simple. There are lots of documentation, tutorials, and examples available for almost all…

🏔️ But What About Alpine?

🐍 Python

Buster

Alpine

Distroless

PyInstaller Quickstart - PyInstaller bundles Python applications

Help keeping PyInstaller alive: Maintaining PyInstaller is a huge amount of work. PyInstaller development can only…

📦 Containerizing It

📚 Additional Resources

Distroless Containers: Hype or True Value? | Hacker Noon

This article describes one of the latest trends in the container world - it's called distroless containers. Containers…

Docker CMD vs ENTRYPOINT: What's The Difference & How To Choose

In a cloud native setup, Docker containers are essential elements that ensure an application runs effectively across…

Distroless is for Security if not for Size

If you are not familiar with Distroless, its container image built by google which is basically docker image minus…

How to Harden Your Containers With Distroless Docker Images

Use distroless images to secure your containers on Kubernetes

grycap/minicon

When you run containers (e.g. in Docker), you usually run a system that has a whole Operating System, documentation…

Use multi-stage builds

Multistage builds are useful to anyone who has struggled to optimize Dockerfiles while keeping them easy to read and…

Written by Starlight Romero