The state of Docker container documentation: some workarounds and a vision for a possible future
We need to confront container documentation as the crucial, non-trivial problem that it is.
TL;DR — As far as I can tell, there’s currently no way of providing documentation for specific containers that we could fairly call canonical, “best practice,” or even all that widely used. This blog post suggests some currently available (but sadly not-great) workarounds but also points to what I think could be a fundamentally better path.
A few days ago I made an offhand tweet that made a much bigger impression than I had anticipated:
I tweeted this largely out of frustration because I’d grown a bit weary of the development process surrounding containers along one axis in particular: documentation.
The standard gist of the responses to my tweet was essentially “Great idea!” and “This would be so useful!” So my suggestion seems to have struck a nerve. I suspect it did so because we have this vast firmament of containers and container-related tools at our disposal and yet we still don’t have any canonical, “best practice,” or even widely used way of documenting containers.
The result is that questions like these virtually never have readily available, no-Google-involved answers:
- How do I run this thing? Should I run a specific executable?
- What’s the “blessed” way to run it?
- Which ports should I use?
- How do I debug it or run its in-container test suites?
Sometimes the problem is that no one has provided answers to these questions anywhere. More often, in my experience, the answers are out there but suffer from a really fundamental discoverability gap.
The core problem: where do I look?
Now what do I do? To find out, should I check the embedded README on Docker Hub? Should I try to track down the actual Dockerfile out there somewhere? Should I check the official docs of the company, platform, product, tool, etc.? And then what about me and my own development practices? How should I offer instructions to others?
Caveat — This post only talks about Docker containers. The problem I see, however, is absolutely not restricted to Docker, and I think that the solutions I suggest should be incorporated into any container platform.
Workaround 1: OCI annotations
One possibility is to use labels or annotations to point people to the right information. Brandon Mitchell, for example, suggested this path in his response to my tweet:
This helpfully alerted me to the existence of the Annotations spec, which is part of the Open Container Initiative’s Image Format Specification. One of the pre-defined annotation keys in the spec is an
org.opencontainers.image.documentation key that’s intended to point to canonical documentation URL for the image.
I think that this is an okay solution, but just okay. It requires me to
inspect the image, copy a URL, and then go to that address. That address may be just the right place with just the right info. It may also be the wrong place, e.g. a root URL for a massive documentation portal that requires a lot of further digging.
Workaround 2: Label Schema labels
After reading up on the OCI annotations spec, I searched further and stumbled on an effort called Label Schema, which seeks to codify some canonical (though optional) labels under the
org.label-schema namespace that developers should try to include with containers whenever possible.
These include six documentation-specific labels, listed in the table below (with a brief explanation and an example label):
I think that this is much better than the OCI annotations workaround for a variety of reasons:
- They deliver very specific information rather than a website of unknown quality or format.
- You can provide multiple bits of data rather than just a single URL.
- There are actual standards applied to some of the labels, e.g. the requirement that the
docker.cmd.testcommand return output via stdout and provide an exit code.
But there are some drawbacks to the Label Schema approach as well. One small reason that’s worth mentioning: most users will likely interact with these labels by running
docker inspect and parsing the JSON output with tools like jq, but that approach gets awkward when your keys are strings that contain periods. And yes, you can use Go templating as well, but that immediately leads us to the core of the problem: do you really want to resort to JSON parsing or Go templating to get simple bits of information that may be essential to the user-friendliness of the container?
Proposed solution: in-container documentation
Beyond these specific gripes, I guess that these two approaches just doesn’t feel big wins to me. They treat documentation as a problem on par with finding out who maintains an image, not as a vital part of the software development process.
I think the fundamental problem with both current workarounds is that they’re, well, workarounds. They’re not woven into the fabric of the tools. They’re rooted in metadata and “solutions” involving metadata, regardless of the domain, usually end up feeling provisional and under-baked, even if they’re “blessed” practices.
So my proposed solution is to bake documentation capabilities into Docker itself (other tools should listen up as well). I don’t have fully fleshed out interfaces yet, but I’ll throw some out there. Imagine if you could run…
…and get HTML output for the container’s docs. You could even set an
--open tag to open the browser to the page. Or maybe set a
--formatted flag to get pretty HTML output generated via a standardized template.
Maybe you could specify a docs directory and get a bundle of docs with a built-in sidebar nav and support for hyperlinks. Or some gentle soul could provide support for Asciidoc or reStructuredText. Or your supplied Markdown would be a template that supports variable interpolation. Or you could specify a URL for a Markdown file that gets downloaded and injected into the image data at build time. Let’s go nuts here.
And if you run
docker docs and get an error because none was provided? Perhaps fall back to a commonly used label value. Or even better: your image doesn’t pass your CI build because
docker build --enforce-docs returns an error. So your image’s end users never have that problem in the first place.
So how do you get the docs into the container in the first place? Let’s bake that into Dockerfile syntax:
This capability would give us access to a single page that we could read to learn about the container. But we’d lose the kind of specific docs that we saw in workaround 2, where we could access a run command, test command, etc.
So let’s have a different command for that:
And let’s provide those usage instructions via the Dockerfile:
usage.json file could look like this:
Let’s support YAML too, while we’re at it. The
IMAGE name in the commands could be automatically inferred (as could other variables). And let’s add a
docker build --enforce-usage flag to make CI builds fail when this information isn’t provided (or even provide checks to ensure that the supplied commands are valid and safe).
These interfaces would likely change over time. I’m mostly spitballing here. The important thing is that the mere fact of having these things baked into the tools would not-so-gently nudge people in the direction of using them. We’d begin to have a clear path here.
The bigger picture
Beyond day-to-day, CLI-driven development practices, I think that there could be major ecosystem impacts to having image-native documentation (neologism of the day?). Some possibilities:
- Self-documenting container clusters: Imagine if container orchestration platforms like Kubernetes and Docker Swarm gave you access to documentation for all the containers in your cluster via the web dashboard. You could pull open the docs in one browser tab and tweak configuration in the other. Or use the CLI to do the same. Imagine
kubectl docs my-http-server-pod.
- Registry integration: Tools for accessing container docs — browsing, searching, collating — could be built into the very systems that store and deliver our containers. Those places could be the first place we go to find info, dethroning StackOverflow as the tool of first resort.
- The downfall of the README regime: I’m mostly kidding here because I like READMEs and use them all the time. But it nonetheless feels odd to me that READMEs accessed via the public Internet are still our first resort in an ecosystem that’s defined by revolutionizing the way we package software.
I don’t know if my proposed solutions are all that great. There may be hidden downsides and even if adopted outright they may not gain a lot of traction. So feel free to disregard them and, even better, suggest your own!
Far more important than my specific suggested solutions is that we begin to see container documentation as a non-peripheral, non-trivial problem that has real consequences for our development and operations practices. If you read this and start thinking about container documentation as a problem then I feel like I have already succeeded.
Containers were supposed to revolutionize how software is built, distributed, and run. They were supposed to make easier and even downright joyful things that were once tedious and laborious. Let’s take that “supposed to” and keep running with it.
At the moment, container documentation is not easy and it’s certainly not joyful. But I think that with the right vision it could be the best thing about working with containers, another “killer app” that wins over the container skeptics. And I don’t believe that anything I’m proposing would present technical challenges that wouldn’t be worth the effort.
After all, we’ve already built this great new mechanism for encapsulating the software that we build, share, and use. We don’t need to fundamentally re-conceive that mechanism. We just need to use that mechanism to *ahem* contain one more very useful thing.