Tags are about as reliable (and disposable) as this guy right here.
# docker pull <someimage>:<sometag># docker run <someimage>:<sometag>

Seems simple enough. You and I have done this, at this point, likely hundreds of times. But what happens when I need to be able to run my image as a variable number of containers over time as part of a deployment? You know, like actual real-world adult stuff. Tags are mutable and humans are prone to error. Not a good combination. Suddenly, that ‘<sometag>’ piece doesn’t look so great. Here we’ll dig into why the use of tags can be dangerous and how to deploy your containers across a pipeline and across environments, you guessed it, with determinism in mind.

Let’s first look at what I mean by ‘deployment’ in the context of containers. Remember why we all started using containers? They’re portable, lightweight compared to VM’s, and their images are immutable. This immutability offers me a guarantee that any and all containers that I instantiate will be absolutely identical at inception. Surprise surprise, deterministic operations. Next, when I deploy my containers off of a given image, I want to ensure that whether it’s today or 5 years from now, that specific deployment uses the very same image that I defined. Any updates or newer versions of an image (meaning a new image at that point), as a best practice, should be executed as a new deployment. This holds true if I’m attempting AB tests, a Blue/Green deploy, etc. This is where we move beyond the immutability of container images for deterministic container instantiation into deterministic deployments.

Deterministic deployments is not guaranteed because of the way docker currently works with tag-based image pulls. In fact the tagging mechanism in docker today can belie the concept of deterministic deployments entirely. And here is the rub: docker has two primary (but not equal) identifying data points. The first and most familiar is of course tagging. This is what I referenced at the beginning with my example of ‘<sometag>’. Most often, we find ‘:latest’ as part of most images we work with. We can also apply our own custom arbitrary tags to whatever images we like which can result in a single image having multiple mutable tags. Tags can go so far as to specify various repositories as well for a given image. Tags are simply human readable labels that act as aliases against an image. They can change very easily and on a whim. For example if you pull and run a docker image by ‘:latest’ today, you might be getting version 1.0 of some image. Great. Now wait a month or two and pull/run that image again. You’ll likely find a newer version of the app/workload than what was previously there in your first deployment. Yet you’re still going against the same tag. Inherently this means that tags are non-unique identifiers. They break the paradigm of determinism. I use ‘:latest’ in the above example but the same applies for any tag. Even ‘:1.0’ is simply an alias that can ultimately point to an entirely different image. All it takes is human error or an oversight in a development workflow that relies on tagging like this.

Not only is this precarious for me the individual user, but when you start expanding this across teams and organizations, you run the risk of different folks running different images with the expectation that they are all the same when in fact, they are not.

Key Point: If an application is updated or patched, I want that update to be explicitly tied to any and all current and future deployments to maintain versioning consistency and avoid container drift (yes this is a thing, more later).

Looks good. Now assuming some arbitrary time has passed, let’s run it again.

Well, that wasn’t expected. Version 2? Someone tagged the image incorrectly. Note: The Digest values are different.

The second and much less familiar way of pulling and running an image is by digest [1]. Every image may have multiple tags, but it can only ever have a single unique digest. This is a SHA(256) value and is known as the immutable identifier for the image. Forever. The second you do something to an image, via a Dockerfile build for example, you create a new image with a new immutable identifier / SHA digest value.

Now with the same scenario of my deployment, if I pull by the digest value of the image, I’m 100% guaranteed that I’m pulling the exact same image whether it be today or years from now, regardless of conventional tagging. The SHA value effectively becomes the source-of-truth-tag against my image. Now if my application is updated and I want to update my deployments, the link between my deployment and the new image (SHA value) should be re-established and refreshed, but that’s a separate discussion we will address later.

Now that I’ve extended deterministic ops from just running a container to actually forming a concept of a deployment over time, I can now safely bring those containers into a multi-tenant deployment environment with more complex workflows.

But, there’s a catch. At this point we need to address the confluence of pulling by tag and pulling by digest. Pulling by tag, as illustrated earlier, is far more usable for developers. The issue here however is that pulling by digest does not bring down any of the tags of a given image as illustrated below. Queue the sad trombone.

This is by design in the docker engine, where an assumption is made that if you are pulling by digest, your tags are rather irrelevant. Well, as a developer frankly I’d like to keep my tags intact thank you very much. I want the best of both worlds. Ultimately, I also want the ability to tie the digest value with a tag so that I can still predictably and reliably deploy images based upon the same tag (such as ‘:latest’) as per my deployment intent.

Loosely related to this, and rightfully pointed out by a good friend of mine John Osborne (@oss_advocate), is the fact that Docker currently has an issue with its own builds in that two separate builds from the same Dockerfile result in differing hash values. Given the context provided above, the implications here are pretty clear in terms of deterministic workflows in a larger deployment environment. If you have 2 minutes, you can view the following as an example of what I mean:

In a follow-up, I’ll lay out how we can accomplish a marriage of flexible usability and determinism in container deployments at scale using a construct known as Image Streams.

[1]- https://docs.docker.com/engine/reference/commandline/pull/#pull-an-image-by-digest-immutable-identifier

--

--

Tariq Islam
Tariq Islam

Written by Tariq Islam

Father, Engineer, Googler. All posts and opinions are my own.

Responses (3)