Lessons learned running Docker in production
The other day, I came across an answer on Quora (via the excellent Software Engineering Daily site) in which Devrim Yasar, the CEO of Koding wrote about their move away from containers and onto VMs. I found myself nodding and smiling repeatedly throughout the article, because it was clear that he was someone who had encountered many of the same difficulties that we’ve struggled with over the last year or so of running Docker inside Treehouse, and shared some of the same reservations. Being so encouraged, I thought I’d take the time to share our own experiences with Docker.
Before diving into the specifics of the troubles we’ve encountered, though, and how we’re working to solve those, it’d be helpful to first understand our use case.
As an online education provider, we’ve got a few specific use cases we have to cover. We provide facilities for evaluating student learning via Quizzes and Code Challenges. In addition to that, we have an online IDE called Workspaces that allows students to quickly spin up coding environments.
Quizzes are essentially multiple-choice or fill-in-the-blank exam-style questions designed to test understanding of the material, and Code Challenges test a student’s ability to use the information they’ve learned to solve a problem.
Once a student has written code to solve the problem they’re presented with, they can submit their code for evaluation, and we perform validation on the backend, where we compare the submitted code to an accompanying suite of assertions to confirm that it satisfies the Challenge requirements. If you think this is starting to sound a lot like a CI system, you’re absolutely right. And though there’s some significant differences, Code Challenges share many of the same requirements. So the solution looks similar, at least in the broad strokes.
Workspaces allows students use to work on more in-depth project-based learning, see how a large project evolves over time, and experiment with different approaches.
With these use-cases in mind, here’s some of the benefits we get from using Docker for these services:
Containerization is often (rightfully) described as a kind of lightweight virtualization, somewhere between a chroot and a VM. It lets developers and operators partition a host with minimal overhead by sharing the host kernel (avoiding the need to emulate hardware), and running just the processes that are needed, while also applying some resource constraints on things like CPU and memory utilization. Docker wraps all of this in a nice API, and handles some additional niceties, like doing dynamic port-mapping between the host and container networks. Since we’re running hundreds of Workspaces and dozens of Code Challenges at any given moment, being able to (more or less) isolate these from one another lets us pack a multitude of containers into a smaller number of hosts, reducing the cost of operating this kind of service.
Another nice thing about containers is that they lower the barrier to attaining so-called immutable deployments by making image creation, promotion, and release into part of the standard work-flow. This is hugely beneficial for our use case, as we try to do all of our Code Challenge tests in “clean-room” environments to ensure consistent and repeatable results. Nothing’s so frustrating as writing a working solution to a problem and finding it not returning the expected result due to environmental Heisenbugs, or having your code behave differently in a Workspace than on a Code Challenge. By building base images with the same runtimes and libraries pre-packaged, we can stack the distinguishing features on top of this to “specialize” an image into an app, and by testing student code in clean-room container instances, we can ensure our students get a reliable, consistent experience from our platform.
Our Code Challenge system evaluates student code submissions at a rate measured in Hertz, with hard SLAs on time to return a pass/fail result. Providing fast feedback to students as they iterate on solutions is key to maintaining interest and minimizing the frustration of what is already a very challenging process (learning to code), so we do a lot on both the front- and back-ends of this system to cut down on any unnecessary delays. While Workspaces has less stringent start-up requirements, speed is always a factor for web services, so we pay close attention to start-up times here as well. With average container start-up times measured in milliseconds, Docker is a perfect fit our demands.
I won’t go on much more about the advantages of containers, since that’s been expounded on extensively elsewhere, but hopefully the above explains a few of the reasons we chose to build these services on Docker, before we get into some of the pitfalls and difficulties we’ve encountered.
Working with an entire machine rootfs archive as your method of deployment is great for repeatability, but it can be quite unwieldy in practice. We base our images on a minimal CentOS image (~190MB), and once the required collection of runtimes, libraries and apps is loaded, we usually end up pushing an image around 2GB. This makes updates fairly slow to push to Docker Hub, and slow to pull down to our Docker hosts. We’re investigating solutions for reducing our image sizes and speeding up pulls by using BitTorrent to distribute the images, but our need to support a broad range of possible applications inside our containers makes it difficult to safely trim down their size. So far, we don’t have a great solution for this problem.
Docker Daemon Latency
One consistent problem we have in Code Challenges is with latency and deadlocks in the Docker daemon; especially at the rate at which we create and destroy containers, Docker doesn’t do a particularly good job of staying responsive. There’s some active bugs where they’re investigating the issue, which appears to be related to lock contention, but in the meantime, we don’t get the kind of consistent performance we’d like to see.
We address this in a couple of ways: distributing the load across a larger number of smaller docker hosts to reduce the load on a given host, and automated restarts of the Docker daemon when API monitoring fails to meet our latency criteria. Since all of our containers are highly disposable, we also nuke /var/lib/docker/containers and /var/lib/docker/volumes during pre-start, which seems to help a lot as well.
Daemon blast radius
Another problem we see with Docker, that’s certainly not unique to our use case, just has to do with Docker daemon running all containers as child processes. As a result, restarts of the Docker daemon trash all of the running containers, whose services may otherwise be working just fine. This is another situation where distributing the load across a larger number of Docker hosts can help reduce the impact, but in the long-term we’d like to see Docker daemon spinning up detached runc-based containers that can continue to function while a Docker daemon instance is replaced. CoreOS’ rkt project seems to have gotten this right, and now that it’s past 1.0, we’re investigating the possibility of switching to rkt for this reason.
Security’s not perfect… and neither are resource constraints
Running your own apps in containers is one thing, but running third-party code is another. It’s essential to follow best practices, and take a multi-layered approach: use a MAC policy system like SELinux, use capabilities, firewalls, security groups, and unprivileged users to limit containers to just the needed access, and don’t hesitate to kill off a container or host at the first sign of weirdness. Even then, you may be surprised to discover where bugs can lurk (e.g. CVE-2015–8660, a kernel bug we found which affected the overlay filesystem, a popular filesystem used with Docker), so keep up to date with security updates, and if you can, follow security mailing lists to stay on top of emerging issues. So far (knock on wood), we’ve been able to avoid any host-level compromises, but planning for when (not if) it happens is essential if you’re running untrusted code.
Additionally, while you can apply resource constraints to a container, a process exceeding them may or may not be reaped by the kernel fast enough to stop the whole system from tanking. Shortening the default CFS period can help with this. We recommend applying constraints to all available subsystems, and relaxing them as needed to provide your desired experience. Leave your containers unconstrained, and you’re sure to encounter them running over one another before too long :).
Single-app containers are not always realistic
Docker Inc. strongly promotes a single-app per container deployment model, but lacks a facility like rkt or kubernetes’ “pods” to help unify modeling of multi-service apps that are composed of several distinct processes that cooperate and communicate in a tightly linked deployment. While linked containers help bridge this gap, they’re an unsatisfactory solution, as multiplying the number of containers you need to run only exacerbates problems of scale that impact daemon responsiveness. To reduce complexity, and reduce the number of concurrent containers needed, our Workspaces system runs multiple services in systemd-as-init containers, and uses systemd to bring up Workspaces-related services like our web-based terminal, our filesystem API, and a web server students can use to preview their code.
This has worked pretty well for our use-case, and lets us treat a single container as an “active Workspace” unit, reducing the number of Docker API calls needed to create/destroy a workspace, and preventing a Workspace instance’s components from getting “orphaned” from their partner-processes.
Edge-cases are everywhere
Truth be told, containers, and the tooling around them are still fairly nascent technology. Though they’ve been in heavy use at industry giants for a while, their broader adoption is a relatively new phenomenon, and there’s lots of room to run into sharp edges. One of my favorite incidents where we encountered such an edge was in our Code Challenges app.
As most people who’ve worked with Docker will know, the typical Docker network setup involves a bridge interface that containers use as a gateway, with packets being masqueraded via the hosts primary IP address. We followed a very standard deployment for our Docker hosts, which matched the above description, only to run into a strange issue where network calls made from inside containers would frequently time out. It presented like an intermittently faulty network on the hypervisors running our Docker hosts, but on further investigation (performing packet captures), it turned out that the outbound network calls were in fact receiving replies all the way back to the docker host, but were then failing to be picked up by the bridge interface.
After some further packet inspection and a careful reading of some ‘ip monitor’ output, we determined that the MAC address of the docker bridge was changing between the outbound request and the returning packets! A little googling soon led us to a helpful blog post, describing a funny quirk of Linux networking, in which a bridge interface without a static MAC assigned will automatically assume the lowest-valued MAC address of the currently-attached interfaces. On our high-churn hosts, this happened regularly enough to cause significant disruption, so we now statically assign MAC addresses to the docker bridge, and haven’t had trouble since.
This is but one of a several surprising gotchas we found waiting in this brave new containerized world.
While containers may not be the panacea the Docker hype-machine would have you believe, they’re still an incredibly useful technology, and it’s easy to see how they’ve gotten so much traction and interest in such a short time. We’re grateful to Docker Inc, CoreOS, the Kubernetes team, and all the contributors and bloggers who make this such a rewarding and exciting ecosystem to participate in. Like all new technologies, there’s lots of sharp edges, though, so keep your trusty packet inspector and process tracer to hand, and be sure to share your stories!
If you enjoyed this post, we’d be thrilled if you clicked the ♡ below! And be sure to give the Treehouse Engineering publication a follow. 👋