How to build better Docker images: Tested, scanned and optimized

Jordi Febrer Jordà
Worldsensing TechBlog
5 min readJun 19, 2019

The last months have been an exciting time: Lots of new tools arrived to help the container ecosystem, Kubernetes (alone or through Docker EE) becomes the de facto standard to orchestrate containers, the NCA (Native Cloud Applications) started to become popular, also mesh services are the new black right now and various Docker/Kubernetes conferences have passed.

Dockercon Europe 2018 (Barcelona)

But… All this excitement should live together with our current infrastructure and tools. And there’s no excuse not to try to improve it at the same time as we learn new things… that’s why I wanted to check out our current images.

I’ve focused in three main subjects: Test, scan and, finally, with no less importance slim Docker images.

Test images

We test our code and check our environment with different type of tests but what about our images? To do that I’m going to use a framework developed by Google and OSS contributors that allow us to easily test images. Although I’m not going to use that in this post I think it’s worth mention that this framework allows to test images even without Docker installed and obviously could be useful in development pipelines avoiding a Docker in Docker situation.

Tests have to be written in yaml or json format and there are four type of tests:

  1. Command Tests. In this example we are going to test if a custom python package called myfoobarpackage has been installed:
commandTests:
- name: "init"
command: "pip"
args: ["list"]
expectedOutput: ["myfoobarpackage"]

2. File Existence Tests. In this example we are going to test existence and absence of two files. .netrc file shouldn’t exist(for security reasons) but requirements.txt should exists meaning files have been copied correctly:

fileExistenceTests:
- name: ‘netrc'
path: '/root/.netrc'
shouldExist: false
- name: 'reqs'
path: '/opt/requirements.txt'
shouldExist: true

3. File Content Tests. In the following test we are going to test if a file called myfoobarpackage.py contains the word foo but does not contain the word bar.

fileContentTests:
- name: 'Contains only foo'
path: '/opt/myfoobarpackage.py'
expectedContents: ['.*foo.*']
excludedContents: ['.*bar.*']

4. Metadata Test. Finally in the last test we are testing if contains an env var foo with value bar, exposes port 8080, working directory is opt and have a parameterized cmd that runs a python file called myfoobarpackage.py.

metadataTest:
env:
- key: foo
value: bar
exposedPorts: ["8080"]
entrypoint: []
workdir: "/opt"
cmd: [ "python", "myfoobarpackage.py" ]

These tests can be easily integrated in our CI/CD scenarios. Execution example:

if (container-structure-test test --image my_docker_user/my_docker_image_name:my_docker_image_tag --config config.yaml); then
echo "Yay! Let's publish the image.."
else
echo "Error.."
fi

More information here: Container Structure Tests.

Scan images

Docker Hub, docker EE and other solution contains scanning tools but sometimes is handy to have an OSS solution that can flexibly be integrated in our day-to-day development circle.

With Clair and one of its clients Klar we can test public and private images and like testing images, this can be integrated in our CI solution as well.

Instructions about how to install Clair: https://github.com/coreos/clair/blob/master/Documentation/running-clair.md

Klar can be downloaded as a binary from https://github.com/optiopay/klar/releases.

Example testing private images:

CLAIR_ADDR=0.0.0.0 CLAIR_THRESHOLD=10 DOCKER_USER=FOO \ DOCKER_PASSWORD=BAR klar \ my_docker_user/my_docker_image_name:my_docker_image_tag

Example testing public images:

CLAIR_ADDR=0.0.0.0 CLAIR_THRESHOLD=10 klar nginx:1.15

Result:

clair timeout 1m0s
docker timeout: 1m0s
no whitelist file
Analysing 3 layers
Got results from Clair API v1
Found 77 vulnerabilities
Unknown: 2
Negligible: 35
Low: 12
Medium: 18
High: 10

As we can see, the result of previous klar command will show the number of vulnerabilities that the image contains classified by level of severity: Unknown, Negligible, Low, Medium, High, Critical or Defcon1.

Finally, klar allow us to test private images from public and private Docker repositories, Google Cloud Registry and Amazon Elastic Container Registry.

Slim images

What about having smallest images, reducing time to pull/push and increasing security having less files/binaries?

I’m going to focus in two techniques to reduce the image size: Multi-stage build and distroless images.

Multi-stage build

This technique is not new, but I think is worth to take a look again. Consists on using more than one FROM in a Dockerfile. That allow us to have multiple build steps.

Imagine that we have a Dockerfile to build a static websites that needs to compiled NPM packages and also we need a webserver. Using multi-stage build we can start a Dockerfile using an NPM official image, copy our files on it and compile all the dependencies on it. Then, in a second step we can copy previous files to a shinny new stage build starting from Nginx image and running the default entrypoint. Doing that we can reduce easily more than half of a gigabyte:

FROM node AS builderRUN node -v
RUN npm -v
COPY . /optWORKDIR /opt/appRUN npm init -yRUN echo "TEST <b>HTML</b>" >> index.htmlFROM nginx:1.15-alpineCOPY --from=builder /opt/app /usr/share/nginx/htmlRUN ls -la /usr/share/nginx/html/

Execution:

docker build -t "front" .
docker run -it -p 8003:80 front
➜ testfront docker images | grep front
front latest 9fc1efaf2f4d 4 minutes ago 17.8MB

From almost 1 GB and two images in one standalone image of 17.8MB!

Distroless images

Using the previous technique and a distroless image as a final image we can improve even more our images (not necessarily in size but regarding security). Distroless images ( https://github.com/GoogleContainerTools/distroless ) are more secure because they don’t have a shell by default and only contain required binaries. These images have been developed by Google and OSS contributors.

Although there’s a small list of distroless images available, if we have different requirements we can create our own distroless Docker image (or just a Docker image) using Bazel (https://github.com/bazelbuild/rules_docker). Example of python image using a distroless image in a two steps build scenario:

FROM python:2.7.12-alpine AS builderENV MY_DIR /opt/
WORKDIR $MY_DIR
COPY app/ $MY_DIR/app
COPY requirements/ $MY_DIR/requirements
RUN apk add --no-cache --virtual .build-deps build-base \
&& pip install -r requirements/dev.txt
...
FROM gcr.io/distroless/python2.7COPY --from=builder /opt/app /app
COPY --from=builder /usr/local/lib/python2.7/site-packages /usr/local/lib/python2.7/site-packages
WORKDIR /appENV PYTHONPATH=/usr/local/lib/python2.7/site-packages
ENTRYPOINT [ "python", "app.py" ]
CMD [ "param" ]

In conclusion, following previous techniques we’ll have less surprises in production, security will be in the radar again and the infrastructure much faster to be managed.

These have been some simple examples but there’s a lot more to improve our current applications and infrastructure. Thanks a lot for reading us!

If you found this post interesting, we are always hiring and interested in meeting all types of engineers, regardless of your skills or what tools you use day-to-day. Your intelligence, creativity, energy and enthusiasm are much more important to us than your experience with our stack.

Check out our careers page in here — https://worldsensing.wpengine.com/engineering/

--

--