A Peek into Docker Images

Adam Gordon Bell
Nov 2, 2018 · 4 min read

A look into docker images using snakes

Let’s talk about docker images. A docker image is made up of one or more layers and some metadata. When you do a docker pull these are retrieved from dockerhub or your repository of choice.

That is simple enough. However understanding how docker images work under the covers, where the various digests and hashes you see returned by the docker client come from, is a little more challenging. Let’s walk through the structure of an example image and see what lessons we learn.

First, let’s pull a fun docker image. How about a terminal game like snake?

After a docker run we will get a fun little snake game.

> docker run -ti dyego/snake-game
I can’t get past a score of 150

You may notice the following output was generated before the game started:

Unable to find image 'dyego/snake-game:latest' locally
latest: Pulling from dyego/snake-game
1160f4abea84: Pull complete
4d49542c61a4: Pull complete
3ee103c86f60: Pull complete
9a56d2eb1eed: Pull complete
67e0ebed9a3b: Pull complete
194910951a14: Pull complete
20ccf4425819: Pull complete
4b3db85b3b19: Pull complete
b612933e98de: Pull complete
ab455ad83399: Pull complete
Digest: sha256:b2b4751952d24fa810a91620aee5f49a1cdf7d05b472a209920f3310f1a84bc1
Status: Downloaded newer image for dyego/snake-game:latest

We can see that docker image was made up of 10 layers. Each layer was pulled down by the docker client. Then a digest was emitted, the docker pull was complete and then the game started.

Pulling it apart

> skopeo copy docker://dyego/snake-game:latest dir:./1160f4abea84cbe2f316db6306839d2704f09a04af763ee493dd92cb066c0865

We get one file for each layer in our image, a manifest file, and one extra file, which we will get to later. We can see that the first twelve characters of each file correspond to the hash returned by docker pull for that layer.

We can also see that the file name of each layer is actually the sha256 of the contents of the layer:

> cat 1160f4abea84cbe2f316db6306839d2704f09a04af763ee493dd92cb066c0865 | shasum -a 256
1160f4abea84cbe2f316db6306839d2704f09a04af763ee493dd92cb066c0865 -

Finding #1:

Layers on top of layers

"layers": [
"mediaType": "vnd.docker.image.rootfs.diff.tar.gzip",
"size": 1991501,
"digest": "sha256:1160f4abea84.."
"mediaType": "vnd.docker.image.rootfs.diff.tar.gzip",
"size": 80230150,
"digest": "sha256:9a56d2eb1eed..."
}, ...

The manifest file lists each of layer digests for its ten layers, as well as the size and the format of the file. The ordering here is important as docker images use a union file system.

Each immutable layer is an overlay on top of the previous layers. In this case, since this image is based on golang:alpine, the first five layers are shared with many other images that build on golang:alpine.

Image Config

"config": {
"mediaType": "vnd.docker.container.image.v1+json",
"size": 5542,
"digest": "sha256:97b9447a34ec..."


The config is the extra file I mentioned we would get back to. The config is a json document and contains metadata about image creation.

> cat 97b9447a34eca52d4283759df0f47f42cb9629b3ab6058fca5a993cfacb1e7a8

And again its filename is its sha256 hash:

> cat 97b9447a34eca52d4283759df0f47f42cb9629b3ab6058fca5a993cfacb1e7a8 | shasum -a 256
97b9447a34eca52d4283759df0f47f42cb9629b3ab6058fca5a993cfacb1e7a8 -

We will cover the config file in more detail in a future article

Image Digest

> cat manifest.json| shasum -a 256

This structure is known as a merkle tree.

An image can be referred to by the hash of its manifest and the manifest contains a list of the child dependencies of the image. Its dependencies are its layers and the config. Any change to any layer will cause its digest to change, which will cause the manifest to change, which will cause the entire image to have a different digest.

In fact, you can always refer to an image using its digest rather than a tag. Referring to its by its digest is more verbose, but the digest, unlike the tag, can’t be updated to point to a different image without changing the digest itself. This makes it a great choice in many situations. Here is how we would do this:

> docker pull dyego/snake-game@sha256:b2b4751952d24fa810a91620aee5f49a1cdf7d05b472a209920f3310f1a84bc1

Finding #2:

This is also how git works internally. Each commit can be referred to by its hash. In git, branches are just pointers to hashes of commits. This is way docker image tags work as well. If you are familiar with how git commits form a tree, this intuition can guide you to understanding the docker image format. They are very similar.

By looking at bit deeper at the docker image format we now have a better and more hands-on understanding of how the format works.


Tenable TechBlog

Learn how Tenable finds new vulnerabilities and writes the…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store