What is Docker?

The buzzword that is taking the DevOps world by storm.

7 min readDec 17, 2013

This is part 1 of a series where we will be exploring what Docker is, and when and how you can use it.

The year was 2006. Virtual machines were waging a war against process isolation for the mind share of programmers interested in deploying their applications into the data center in a close approximation of their development environment. You see, under the process model, systems administrators controlled the production environment with an iron fist. It was necessary to strictly control the versions of software dependencies, the allocation of IP addresses, and the amount of CPU consumed. Under the virtual machine model, each application or service could pretend it was the only thing running on the bare metal. It didn't even have to run the same operating system! Companies saw great savings from having to buy and operate fewer physical machines, or in some cases, no physical machines. A market worth tens of billions for virtual machines and the software to manage them was formed.

The year is now 2013. It is now hard to find an infrastructure offering that doesn't use a hypervisor of some sort. Virtual machines are one of the few technologies that have lived up to the promises and predictions. For all of the good that they have done for us, completely virtualizing the entire operating system also leaves many areas for improvement. Because they virtualize everything, including the kernel and the hardware, virtual machines are usually slow to start. At the very best, the start time would be equivalent to booting the OS directly on the metal, but due to virtualization overhead and subdivision of hardware resources, start-up times are often several times slower than they would be on the bare metal.

Virtual machines are also usually managed as a virtual appliance, and shipped around with all of the operating system and user-land software bundled. These images are often several gigabytes in size. The software application itself, on the other hand, usually comprises a tiny fraction of this space, often a few hundred megabytes or less.

Finally, you have to manage these virtual machines as if they were real machines. Someone or some software has to be responsible for provisioning the machines, installing the application software, setting up the networking and users, and making sure each machine remains healthy and operational.

Enter Docker

Today, there is a new contender called Docker, which is positioned somewhere along the spectrum of resource isolation between the processes of yore, and the virtual machines of today. Fundamentally, Docker provides a way to separate out the concerns of process isolation and dependency management, without resorting to heavyweight virtual machines, or relying on complex language-specific dependency management software. From an implementation perspective, Docker cleverly fuses two interesting linux features, cgroups and layered file systems, into an open source package that aims to become the standard way for managing application containers across all stages of the development process: development, testing, staging, and production.

When working with Docker, one normally uses an instance of a docker container as a drop in replacement for processes. Generally, the docker container will contain only the base layer of the operating system and whatever software dependencies are required to run the application which is represented by the container. For example, in a stack with a software load balancer, a high availability persistence layer, and an application server, each of the three services would be run in its own container. This would allow each service to have a separate and different run-time architecture and dependency versions, allowing them to be improved independently even when run on the same host. Due to the use of the layered file system, if these containers are all based on the same underlying OS, only a single copy of the OS would need to be stored.

Example architecture for a multi-tier application. Each service runs in a different Docker container. Containers can be controlled independently and communicate with each other through the local network stack.

The Docker daemon runs on the host operating system, exposing a RESTful API which can be consumed by a variety of language libraries, as well as a provided command line client. The daemon is responsible for managing the life-cycle of each container. Docker also virtualizes the networking stack within the container, allowing the same port to be bound multiple times, and mapped to different host ports. This allows you to do things like run multiple copies of the same app server on a single host for purposes such as zero downtime deployment of parallel testing. The containers generally start in under a second, allowing for rapid iteration. Because the containers are standard, a developer can run the same container on their desktop, in their continuous integration environment, on their PaaS, and directly on their production servers. Another benefit is that because Docker doesn't rely on virtualization technology such as Xen or KVM, it does not need access to the CPU’s hardware virtualization, and can be run in hosted environments that are already built on virtualization, such as Amazon EC2 or Digital Ocean.

Images

The terminology used when talking about a Docker installation can seem confusing at first, but it’s simple and intuitive when you take the time to understand it. The basic building block of a Docker container is the image. An image is like a slice of a virtual machine image, containing application code or binaries, as well as the execution environment and dependencies. An image may be built on top of another image, and that image in turn may itself be built on top of an image. Each image contains only the incremental changes required to transform its base image to the state required by the image. Images may also contain metadata which may has information such as how to run what is inside the image and which ports need to be exposed. An image serves as an instantiable template for containers.

Containers

Containers are the result of starting a running process from an image and all of its dependencies. Usually a container will represent a single process or service, but containers can talk to each other through sockets or through a socket naming scheme called Docker links. Containers have a life-cycle that is nearly identical to processes: they can be started, stopped, or killed. Docker also provides the ability to create images from a previously run container, thus persisting any changes made while that container was running.

Dockerfiles

Building images by hand from running containers is tedious and error prone. For this reason, Docker also provides a mechanism, called Dockerfiles, to build these containers that is similar to but not as powerful as traditional configuration management software. Dockerfiles allow you to script the actions that should make up each layer of an image. For example, in discrete steps you can instruct Docker to build an image by taking a default Ubuntu image, apt-get installing several dependencies, and then adding your application code. Each Dockerfile command creates a new image layer, and clever structuring of the commands will allow them to be cached and re-used.

Repositories and Indices

In a clever application of the DVCS paradigm, you can also fork and tag images. Together, all of forks of an image and its tags constitutes a repository. This is analogous to a git repository, except that forks cannot be merged back together. (Merging may eventually be supported, but is not on any public road-map.) When a local repository is ready for distribution, it is stored in a service called an index. Extending the DVCS metaphor, repositories can be pushed and pulled from indices, such as the public Docker index, and our own private service Quay.io. Once you have built and pushed your images to an index, you can use that index to deliver those same exact images to your staging or production stacks.

What’s Next?

Like other cutting edge technologies, Docker is still maturing and thus has some restrictions. Docker is built on features available in the Linux kernel, such as cgroups, which means that until version 1.0 Docker can currently only be run on Linux. Docker can also only be run on 64-bit kernels, which appears to be a developer bandwidth limitation rather than a technical limitation. Finally, Docker’s usage of cgroups and namespaces prevents it from running under OpenVZ (without a semi-virtualized kernel shim), and the cloud services which are built on top of it.

We at Quay.io think Docker is the next step in the evolution of the datacenter. When taken together, the container portability, incredible performance, and beautiful API make for a developer experience that is second to none. We are already running user supplied code in Docker containers, and have “bet the farm” on the growing importance of Docker in the infrastructure ecosystem with our new product Quay.io, which brings GitHub-like functionality to the management of private Docker containers. For more information on getting started with Docker, the guys over at Docker.io have a great getting started guide.

Stay tuned for part 2, where we will explore some specific use cases for Docker, and how you can use them to streamline some QA and DevOps operations within your organization.