Let’s make your Docker Image better than 90% of existing ones

Or why you should always Label your Docker Image.

Imagine this.

You’re working on a new project, an exciting one that’s Container native. You’re almost done, and now at the stage where you want to build the Docker Image that ships your little project out to the public. You build on top of Alpine Linux, and pack your tool in to a nifty 80Mb Image that you tag latest and push to the public Docker Hub.

Quickly enough, people start basing their work on your tool. Cool right? Then you fix some minor issues that are no ground-breaking, but are good to have ones. Then build the Images, push to public Docker Hub as latest.

But this time, your changes have broken the way the tool used to be, and now the existing output, documentation, and tutorials don’t work the way they used to. You can’t release a new version, because the changes aren’t important enough.

You can mention the commit ID at which latest is on your Docker Hub page, but that’s going to become cumbersome soon, more so than releasing patch versions every time you do a minor fix. And the resulting Docker Image doesn’t contain that information for tools to work on it. Only if there was a way to include the commit ID as a metadata in to the Docker Image itself.

Or picture a scenario, where you want to specify the licensing information of the Image in the Image itself, rather than documentation or blog posts that reside outside the Image, so any tool can inspect the Image, and figure out the suitability of it for their requirements.

Or any other scenario, which requires embedding metadata into the Docker Image for both human and machine readability.

This is where Docker LABEL concept comes into play. Docker Labels allow you to specify metadata for Docker objects such as Images, Containers, Volumes etc, that will be packaged in to their specific formats. We are interested in how we can leverage Labels for Docker Images.

Docker Labels to the rescue!

Specifying a Label for a Docker Image is simple. You just specify it as another Dockerfile instruction.

LABEL <label_name>="<label_value>"

For example, specifying a few labels to indicate the author and the build date for a particular Docker Image would look like the following in the Dockerfile.

FROM openjdk:jre-alpine
LABEL maintainer="dev@someproject.org"
LABEL build_date="2017-09-05"
COPY tool.tar.gz /mnt
RUN tar zxvf /mnt/tool.tar.gz
CMD ["/mnt/tool/tool.sh"]

After this Image is built, a docker inspect will be able to extract the LABEL information that we embedded into it during build time.

Is that it?

Yes, we solved that problem rather quickly didn’t we? Don’t celebrate just yet though. We have another gaping problem looking at us now.

What should I label my Image with?

Yes, Labels allow us to specify the metadata, but all it does is that. The next obvious step is to come up with some kind of a standard set of Labels that third party tools can look for in the Images.

Thankfully, label-schema already does this. label-schema is a namespace for a standard set of Docker build time Labels that represent most of the metadata that would need to be embedded inside a Docker Image. These include,

  • build-date
  • name
  • description
  • url
  • vcs-ref
  • docker.cmd

among others.

Every label is prefixed with org.label-schema, and the only mandatory Label is org.label-schema.schema-version which should at the moment be of value 1.0.

Want to communicate the commit ID you built your Image at? Use vcs-ref. You can also use build-date for the same purpose, if you’re working on timestamps rather than commit IDs.

Let’s talk about build-date a bit more. It requires a timestamp value that is formatted according to the RFC3339 standard. An example value would be as follows.

LABEL org.label-schema.build-date="2017-08-28T09:24:41Z"

You can generate the current date timestamp in this format in Bash using the following command.

date -u +'%Y-%m-%dT%H:%M:%SZ'

Combining this with Docker build ARGs, a build script can be written to pass the current build time to the Dockerfile.

FROM openjdk:jre-alpine
LABEL maintainer="dev@someproject.org"
LABEL org.label-schema.build-date=$BUILD_DATE
COPY tool.tar.gz /mnt
RUN tar zxvf /mnt/tool.tar.gz
CMD ["/mnt/tool/tool.sh"]

The following docker build command passes the BUILD_DATE argument with the proper RFC3339 standard value.

docker build --no-cache=true --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') -t mytool:latest .

label-schema is not only about Docker. It also has a set of Labels that can be used for acbuild label directives in App Container Build Specification, implemented in rkt.

Now that we have a set of standard labels to work with, we just have to specify those in our Dockerfile in order to make our Docker Image stand out from almost 90% of the rest of the Images on the public Docker Hub. You heard it right. According to a survey done by Microscaling Systems, which is the loudest at the moment to get people to use label-schema, almost 90% of the Images hosted on the public Docker Hub are unlabelled.

Label your Images so that they can stand on their own.

Why label though?

Apart from the reasons stated above, it’s the same reason as we write How-To guides on README.md files and complete tutorials on the GitHub wiki. It’s always a good practice to embed as much metadata as possible to the immutable Docker Image you’re building, so that it can stand on its own in orchestration, management, and build tools. In this context, an Image that is properly labelled will stand out high compared to an Image that has no metadata at all, even though the latter might be better in any other angle.

Now that we have Labelled our Docker Image, it’s time to boast about it.

Microbadger is a service offered by Microscaling Systems that analyzes the Images hosted on the public Docker Hub. It will specifically look at how Docker Labels are used in an Image. It also offers badges showing the Image conditions, such as the number of layers, commit ID (extracted from org.label-schema.vcs-ref Label), and the version (which is also extracted from org.label-schema.version Label). For an example, Microbadger shows the these details for the Ballerina runtime Docker Image hosted on the public Docker Hub. The corresponding labeling in the Dockerfile is as follows.

FROM openjdk:jre-alpine
LABEL maintainer="tryballerina@gmail.com"

# Ballerina runtime distribution filename.

# Labels.
LABEL org.label-schema.schema-version="1.0"
LABEL org.label-schema.build-date=$BUILD_DATE
LABEL org.label-schema.name="ballerinalang/ballerina"
LABEL org.label-schema.description="Ballerina language runtime"
LABEL org.label-schema.url="http://ballerinalang.org/"
LABEL org.label-schema.vcs-url="https://github.com/ballerinalang/container-support"
LABEL org.label-schema.vcs-ref=$VCS_REF
LABEL org.label-schema.vendor="WSO2"
LABEL org.label-schema.version=$BUILD_VERSION
LABEL org.label-schema.docker.cmd="docker run -v ~/ballerina/packages:/ballerina/files -p 9090:9090 -d ballerinalang/ballerina"
Nice little visualization of the layers!

There you go! How to improve your Docker Image a bit more. Also it goes without saying that if you’re intending your public Docker Hub repository to be made an official one, labeling is one of the stated Dockerfile best practices that is considered in the process.

So should you use Labels for every Image that you’re going to push to public Docker Hub? Most probably!

label-schema currently has a list of Labels as its standard for RC1, but if you think there should be more, you can participate in their discussions in the mailing list and the GitHub issues. One particular addition that can be valuable (though a bit unclear on the details) is a Label to specify the licensing information for an Image. This is in discussion in a GitHub issue.