Bring your tools with you
Last year I joined Jimdo’s Werkzeugschmiede team as an Infrastructure Toolsmith. In brief, our goal is to provide a platform (PaaS) where all Jimdo developers can deploy and run their Dockerized applications with as little friction as possible. I’m impressed with what the team has built so far, and to be honest, also a bit surprised just how much work goes into building and running such a platform. At the same time, helping the project move forward in the right direction has been a great opportunity for me to learn and experiment with new technologies.
Our platform, which we internally call Wonderland, utilizes Amazon ECS to run Docker containers on a managed cluster of EC2 instances. For the cluster instances, we decided to use CoreOS, a container-focused Linux distribution optimized for large-scale deployments. These are the basic building blocks of our microservices infrastructure. Everything else — from providing a deployment API to integrating with external services for metrics and logging — is our job. (We also have a chat bot called Alice, and of course, it’s not the only reference to Lewis Carroll’s novel in our stack.)
As is usually the case with non-trivial software, there’s always something — be it ever so small — that doesn’t work as expected. The system is misbehaving in some way and you want to figure out what the heck is going on. In that situation, you instinctively fire up your favorite debugging tools, the ones you trust and are comfortable with, no matter the circumstances. The only problem: the tools you need so desperately might not be available in the environment you’re supposed to scrutinize. Now what?
You might be tempted to run apt-get install or yum install to get the programs you want, even on a production server. What could possibly go wrong? Depending on your infrastructure, the answer is either “not much” (after debugging, you throw away the cluster instance and it will be replaced automatically) or “a lot” (you keep using the now tainted system — a unique snowflake among your servers).
With the rise of container technology, and Docker in particular, there’s another appealing option centered around one simple idea: you bring your tools with you.
As I said, something always goes wrong; our platform is no exception. For example, one time we had to debug an internal Go application that routes all Docker container output from our cluster instances to Papertrail. For some mysterious reason, the application would stop sending any logs out of the blue. (We have since moved on to using the fluentd logging driver, which turned out to be simpler and more reliable.)
In order to find out whether log data was being sent at all, I checked if tcpdump or ngrep were available under CoreOS. Unfortunately, the answer was no. Now I could have installed the missing tools inside the Docker container running the Go application (via docker exec). However, altering the system you want to observe is generally a bad idea as it may cause certain bugs to disappear or change their behavior (so-called Heisenbugs), making it hard to isolate the actual problem.
That’s one of the reasons why CoreOS prevents you from modifying the data in /usr by mounting it as read-only. This is where system binaries live and you’re not supposed to mess with them. Given that the root filesystem is writable, you could still install binaries to /opt/bin, for instance. However, doing so in an ad hoc way without using something like coreos-cloudinit would add another unique snowflake to your server farm. I’m afraid we’re going round in circles…
Of course, I wouldn’t write this article if there wasn’t some delightful solution: CoreOS comes with a helpful little script called toolbox, which will launch a container specifically for the purpose of bringing in your favorite command-line tools.
By default, the toolbox command will give you a container based on fedora:latest, but you may also use your own custom Docker image that comes with the tools you need. The spawned container will have full system privileges allowing you to inspect anything running on CoreOS — including other containers — with ngrep, tcpdump, and friends.
Toolbox itself might not be the most exciting engineering achievement out there. Still, it’s a good example of the bring-your-own-tools (BYOT) pattern. Now that containers are all the rage and many companies are starting to embrace new ways of running services in production, I think it makes sense to also adopt the same technology in other areas like debugging.
I, for my part, like the idea of having my tools at my disposal whenever I need them.
P.S. This article first appeared on my Production Ready mailing list.