Making the Case for Docker
This post started out as an internal blog post describing to colleagues why we should start using Docker in our everyday development and testing processes. Recognizing that the use cases are pretty common in any software organization, I decided to publish here.
I’ve been using Docker off and on since it’s initial release almost two years ago. Since then, Docker has fundamentally transformed the way I work on many software engineering and devops projects.
One of the things I’ve learned is that you can use Docker as much or as little as you want to. You can also quickly prove out something as simple as a piped Linux command on an certain OS or something as complex as a clustered microservices architecture.
The following describes real-life use cases that occur in many organizations. For each use case, I describe how Docker can be used to solve inefficiencies in software engineering organizations.
What is Docker?
“It eliminates the friction between development, QA, and production environments”. — Docker.com
Even after using Docker for a while, sometimes I struggle to give the right elevator pitch on why it’s so great. That being said, I think “Why Did Docker Catch on Quickly and Why is it so Interesting?” by Adrian Cockcroft (formally of Netflix) sums things up pretty well.
Docker consists of Docker Engine which is a portable lightweight runtime and packaging tool. The Docker Engine container comprises just the application and its dependencies.
The easiest way to interact with the Docker Engine is using the command line client. The client is capable of creating new images, running containers and linking containers together.
Docker also comes with ReSTful APIs which makes it easy to automate and customize Docker workflows to fit your organization. I’ve used these well-documented APIs to write a Java client library and to create a thin wrapper to orchestrate versioned containers.
Docker Containers versus Virtual Machines
Virtual machines (akin to VMware or VirtualBox) have a full OS with its own memory management installed with the associated overhead of virtual device drivers. In a virtual machine, resources are emulated for the guest OS and hypervisor which allow more than one guest OS to run in parallel on a single host.
Docker containers are executed with the Docker engine rather than a hypervisor. Containers are smaller and enable faster start up with better performance, less isolation and greater compatibility since the host’s kernel is shared.
Use Case #1: Mimic a Production Environment During Development
One of the most challenging things about developing and testing software is that there’s a constant friction between how a product behaves depending on where it’s running.
In some organizations, engineers may use development-only tools in order to quickly iterate. The tradeoff of quick iteration comes when the development runtime doesn’t match up with test or production. For example, an application might be scaled horizontally in production by running several app servers fronted by an a web server. If the development process only runs a single app server then there’s a real possibility that there are bugs which won’t surface until the product reaches a QA environment, or worse, a customer’s production environment.
At best, this non-parity between development, QA and production environments leads to lower quality and longer time to market because more time is spent fixing production issues rather than working on the next product feature.
The Docker Solution
Using Docker, it’s possible to still quickly iterate during development while running in a production-like environment. Because containers have very little overhead compared to traditional VMs, a more complete production environment can be run on an engineer’s workstation during development.
Additionally, the container(s) used during development could eventually be promoted to QA and production. This increases the confidence that whatever works in development will work in QA and production.
Use Case #2: Empower Engineers to Develop and Run Portable Distributed Systems…On a Single Machine
A distributed application typically runs in production as a cluster with one or more dependent remote services. Due to the resource overhead of typical virtual machines, setting up this entire application on a workstation isn’t possible. This forces engineers to do things like remote debug on a cluster in a data center which may be half way across the country. Painful.
Reasons you might need to run a distributed system on a workstation:
- You need to recreate an bug that only occurs when two or more nodes are running.
- You’d like to debug a new feature that involves the interaction of two or more distributed services.
- You want to write a new implementation of a distributed cache.
Historically, an option might be to ask a system administrator to allocate several machines, install a specific OS, and install and configure all of the application dependencies. With a traditional virtualized environment, this is time-consuming and error prone. Once you have everything setup and working as it should be, you then have to make it portable.
Another option is to spin up compute instances on a cloud provider such as Amazon or Google. While these providers make it super simple to spin instances up and down, they also charge you money to do so. Not only do you have to make your work on the cloud compute instances portable but you also have to spend money that could be used elsewhere in the organization.
The Docker Solution
With the low resource overhead of Docker, another option becomes available: develop and run the system on engineers’ workstation via containers. This empowers engineers to easily configure, debug and even experiment with proof of concepts without the help of a system admin and using additional compute and financial resources. Best of all, containers give you portability for free. Whatever containers you run on a workstation can easily be run on bare metal, in a traditional VM, or in the cloud. Many of the major cloud providers are quickly adopting Docker as a first class unit of deployment.
Use Case #3: Sharing Containers to Reduce Engineering Time
Sometimes processes in a software organization are one-offs which are repeated by different roles. Here is an example:
The Support, Development and QA workflow
The following steps might represent a common bug-fixing workflow:
- A support engineer takes a half a day to configure a certain version of the product and reproduced a production customer bug.
- The support engineer now has to correctly document the installation and/or configuration steps necessary to get to the same application state to reproduce the bug.
- A product engineer follows the support engineer’s documented steps and takes another couple of hours to configure their workstation application instance to reproduce the customer bug. The product engineer fixes the issue and hands it off to a QA engineer.
- Next, the QA engineer has to perform the same steps the support and product engineers did to verify the customer bug is now fixed.
Note that each person in the bug-fixing workflow had to correctly repeat the same installation and/or configuration steps.
The Docker Solution
In software engineering, there’s a principal called “don’t repeat yourself” (DRY) but it’s traditionally only been applied to software code to avoid copy and paste operations. Using Docker, it’s possible to also apply the sample principal to the workflow described above. You can basically perform the install and configuration steps once, save the state of the container, then easily share the container with colleagues by using a private repository (Docker Hub).
A developer can figure out what Docker does, install it and do something useful with it in 15 minutes.
What are you waiting for?