Look Docker, No Distro

Starlight Romero
Nerd For Tech
Published in
9 min readJul 20, 2021

--

This is the second article in the four part series, Minimizing & Securing Docker Images. Check out the other articles in the series:
1.
Bigger .dockerignore, Smaller Docker Images
2.
Look Docker, No Distro

When working with Docker, (image) size matters. A small image size allows us to spin up and orchestrate dozens to hundreds of containers while keeping the disk size small. A small image size also reduces the attack surface of our containers, protecting the application and data. By reducing the disk size and reducing the attack surface, we save money. Therefore, by reducing Docker image sizes, we reduce costs.

A navy blue container ship with the words “YANG MING” on the side.

This is the second article in a four part series. Let us set some expectations. For this article we are going to assume you are already knowledgeable with .dockerignore. Although we will be using .dockerignore, we will not spend much time focusing on it. If you want an in-depth look at .dockerignore check out my previous article. Instead, we will focus on reducing base image sizes with distroless images, removing unneeded dependencies, and multi-stage builds. We will dive deeper into securing Docker images and Dockerfiles in the next article. For now, simply reducing the image size (and in turn, the attack surface) will get us started in the direction of increasing security.

🦹 Smaller Distros To The Rescue

Pulling a Docker image by specifying a tag version golang:1.16 or by pulling the latest version golang will produce the same result as golang:1.16-buster. As you can see, all three share the image ID of b09f7387a719. Buster is the code name for the latest stable Debian version, 10.9. Unless you specifically need this image, in most cases it is not necessary. A better alternative, Alpine, is a smaller Linux distribution to use as a base image.

The Alpine Golang image weighs in at 302MB, while it’s Debian counterpart comes in as a heavyweight at 2.85x the size. Already we can run and orchestrate 2.85 times more containers, using the same disk space.

Using Snyk, we can scan the Docker images for vulnerabilities.golang:1.16-debian has 164 known vulnerabilities, 14 of those being high severity. In comparison, the golang:1.16-alpine Docker image has 0 known vulnerabilities. However, this is just the start.

The problem is that we are looking at the problem from the wrong angle. If our goal is to reduce the image size why are we trying to find smaller and smaller distros? Any distribution we find is bound to have packages which we do not need. Those packages are taking up disk space and certainly have vulnerabilities of their own. We should instead build the image from the ground up. Much like a classic car enthusiast, who knows exactly what’s in his car, we will build our Docker images knowing exactly what goes into them.

🔧 The Project

The application we will build to showcase our small Docker images will be a simple API that has one route. A GET request to the / route will make an API call to in order to get the user’s IP address, ISP, and location data, including the user’s latitude and longitude to the millionth decimal place. All this data will be returned.

If you want to download the code and try it out yourself, you will need to sign up for a free API key from:

https://geo.ipify.org/

The repositories for each language are linked below:

🏎️ FROM scratch

Scratch is the most stripped down version of a Docker container. Scratch contains nothing in it except for the executable binary which you add to it. It has no shell, nothing extra.

Golang

Since Golang is a compiled language, we can use a scratch base image in a multi-stage build. We will also build the Golang version of the application in a variety of other ways:

  • buster single stage build
  • buster multi-stage build
  • alpine single stage build
  • alpine multi-stage build
  • distroless multi-stage build

Single vs Multi-stage

You have probably seen a Dockerfile that goes something like…

FROM ...
WORKDIR ...
COPY ...
CMD ...

However multi-stage builds allow us to optimize and reduce the size of our images by selectively copying over artifacts from one base image to another. You’ll be able to see some examples below of multi-stage builds, as we analyze the Dockerfiles of the different repos linked above.

The article linked below provides a deep dive into multi-stage builds. I’ve never gone this extreme with my builds but I can see the use cases for it.

A short excerpt from the article linked above provides a good explanation:

Multi-stage builds allow you to separate build, test, and run time environments needing separate Dockerfiles. They allow you to minimize the actual size of the final Docker container that you deploy, because the various layers are no longer stored in the final container.

Downsides of Scratch

There is one significant drawback to using scratch images; they only work for compiled languages where the code can be compiled into binary and then executed. For many languages, JavaScript and Python included, which are interpreted instead of compiled, a scratch image won’t work. The container of a scratch image would not be able to execute the code since it is not code that is machine readable and therefore needs an interpreter. However, as stated previously, a scratch image doesn’t have an interpreter, it is an empty container.

An Alternative To Scratch

There is hope for shrinking images sizes for programs that are not written in compiled languages. Distroless images came out around 2017. A distroless image is not a single image to solve the problem, like scratch is. Instead, distroless images are a class of minimal images which contain only your application and the application’s runtime dependencies.

The static distroless image, gcr.io/distroless/static, is the simplest of all the distroless images. It contains a minimal Linux, glibc-based system with:

  • 📝 ca-certificates
  • 🔒 A /etc/passwd entry for a root user
  • 🗑️ A /tmp directory
  • ⌚ tzdata

Once again, static images are the simplest distroless images. A step up from the static image, with more added packages, is the base image. (Confusing right?!)

Scratch for Compiled

For compiled languages, such as Golang, distroless containers still offer benefits. Compare a scratch Dockerfile to a distroless Dockerfile.

With the scratch base image, we manually have to add CA certificates. Secondly, we are running the binary as root. There is not even a shell to docker exec into, however rule #1 of Docker security is, never run your application as root.

With a distroless base image, we do not have to manually add the CA certs or worry about running the application as root. The CA certs come with the static distroless image. As for permissions, we just need to select the image with the nonroot tag.

🔩 Building off of Distroless

The base distroless image, gcr.io/distroless/base, contains everything from the static image plus:

  • 🇨 glibc
  • 🌐 libssl
  • 🔑 openssl

Base images are best used for Go apps that required libc/cgo and all other statically-compiled applications that the static image can’t serve.

🚮 JavaScript

To be honest, I rarely use base images myself. Static images usually serve my needs for compiled applications. However, base images are still very important. The Node distroless image, gcr.io/distroless/nodejs:14, contains everything from the base image plus Node version 14 and its dependencies.

Let’s compare some of the Dockerfiles we used to get the results seen above.

Buster

The buster base image Dockerfile is a simple multi-stage build. We copy and install only the production dependencies. We then copy the code. We transfer everything into a new base image, then start the application.

Running snyk container test geo-buster gives the output:

Tested 414 dependencies for known issues, found 334 issues.

That is a lot of vulnerabilities!

Alpine

The alpine base image Dockerfile follows a similar build to the buster image above.

Testing the image for vulnerabilities with Snyk gives the output:

✓ Tested 16 dependencies for known issues, no vulnerable paths found.

Distroless

The distroless (geo-distroless-min) Dockerfile is where things start to get interesting. We use alpine as our first base image. We update the packages, add curl and install node-prune. Then, we copy and install the all the dependencies. Next we copy the code. We use npm and webpack to build a minified and uglified version of our code then use npm prune --production, to remove all the dev dependencies. Next, node-prune comes in and helps us reduce our size even more. We then copy over the build and the node-modules to a distroless image and run the application.

What is node-prune?

node-prune is a small tool to prune unnecessary files from ./node_modules, such as markdown, typescript source files, and so on.

Testing for vulnerabilities we find the following output:

Tested 9 dependencies for known issues, found 25 issues.

🏔️ But What About Alpine?

In the previous section we saw that the distroless build has 25 vulnerabilities while the Alpine build has none. Doesn’t this mean that it is better to use Alpine? Let’s take a look at the tradeoffs.

The vulnerabilities in the distroless image come from:

23 - glibc/libc6
2 - openssl/libssl
1 - gcc-8/libgcc1

These are not vulnerabilities from our application but from the distroless base image itself. Although these vulnerabilities are not ideal, using a distroless image over an Alpine image offers some advantages. Alpine comes with a package manager, apk, and a shell, ash. With the distroless image, any bad actors will not be able to docker exec into the container and/or install new packages. There is a trade-off to be made, however, people (this includes you!) not being able to get into the container does pose a big advantage.

🐍 Python

Similar to the Node distroless image, the Python distroless image, gcr.io/distroless/python3:nonroot, is built off of the base image but with Python 3 and it’s dependencies.

Buster

The Buster Dockerfile is straightforward. We copy the requirements.txt, and install them along with upgrading pip. Next we copy the code. Then we copy everything to a new buster base image and run the app.

Buster has the most vulnerabilities. When running Snyk, the output is:

Tested 431 dependencies for known issues, found 349 issues.

Alpine

The Alpine Dockerfile is similar to the Buster Dockerfile except with a different base image.

Running snyk container test geo-alpine returns:

✓ Tested 37 dependencies for known issues, no vulnerable paths found.

Distroless

The distroless Dockerfile acts a bit different than the two above. We use apt to install binutils, and pip to install pyinstaller along with the requirements. Next we copy the code and run pyinstaller on app.py. We copy the newly created dist folder to a new distroless image and run the app.

Pyinstaller is a handy tool.

PyInstaller freezes (packages) Python applications into stand-alone executables…

PyInstaller’s main advantages over similar tools are that PyInstaller works with Python 3.5–3.9, it builds smaller executables thanks to transparent compression, it is fully multi-platform, and use the OS support to load the dynamic libraries, thus ensuring full compatibility.

Tested 25 dependencies for known issues, found 38 issues.

📦 Containerizing It

Is distroless the true and final savior? It’s good but it has some trade-offs. Compared to Alpine images, distroless images come with several vulnerabilities and a slightly bigger image size. Distroless takes the upper-hand when it comes to having no package manager or shell. Each application and its requirements will need to be understood and analyzed before choosing which base image best suits the need of the project.

Now that you understand distroless, where it shines and where it falls short, you now have the information you need to move forward and create your own secure and minimal Docker images.

📚 Additional Resources

--

--