Docker 1.11 et plus: Engine is now built on runC and containerd
Docker recently released new versions for their entire platform: Engine was bumped to 1.11, Swarm is now 1.2, and Compose and Machine are respectively 1.7 and 0.7. There is also an associated release for Docker Mac/Windows, Beta 10. This is a “tip-of-the-iceberg” kind of release, in the sense that while the user-facing changes are modest, the Engine underwent a massive rehaul, to make it the first Open Container Initiative (OCI) compliant runtime. More specifically, Engine is now built upon runC and containerd.
OCI, runC, containerd… What’s the deal?
What is the Open Container Initiative?
The OCI was announced last June to have an industry standard for container runtimes and formats. The goal of the OCI is to avoid a “balkanization” of the container ecosystem, and ensure that containers built with one engine can run on another. This is essential to achieve container portability. As long as Docker is the only runtime, it is the de facto industry standard; but with the availability (and adoption) and other engines, it is necessary to define technically “what is a container” so that the different implementations can agree on something.
What is runC?
runC is a lightweight tool that does one thing, only one thing, and does it well: it runs a container. If you followed the early history of the Docker Engine, you may know that it used to start and manage containers using the LXC utilities; then it switched to “libcontainer”. That “libcontainer” is the piece of code interfacing with Linux kernel facilities like cgroups and namespaces, which are some of the basic building blocks of containers. To put it simply, runC is basically a little command-line tool to leverage libcontainer directly, without going through the Docker Engine. It’s a standalone binary which takes an OCI container and runs it. For more information, read Solomon Hykes’ blog post.
What is containerd?
containerd is a simple daemon that uses runC (or any OCI compliant alternative) to manage containers and exposes its functionality over gRPC. Compared to the Docker Engine, containerd exposes essentially a CRUD interface around containers, using gRPC; while the Engine exposes not only containers, but also images, volumes, networks, builds, etc. using a full-blown HTTP API. For an in depth explanation, read the blog post by Michael Crosby.
How it all ties together
As aforementioned, Docker Engine is now built upon runC and containerd. Prior to 1.11, Engine was used for volumes, networks, containers, etc. It did all of the work of what is now broken into four components: Engine, containerd, runC, and containerd-shim which sits between containerd and runC.
Docker Engine still does image management and then it hands over an image to containerd to run. containerd then uses runC to run the container.
containerd only deals with containers — it takes the role of starting, stopping, pausing, and destroying containers. Since the container runtime is isolated from the engine, Engine ultimately will be able to be restarted or upgraded without having to restart the containers. Some other benefits are that linux-specific code was removed and this change facilitates the use of other container runtimes while keeping the same Docker UI commands (so on the surface everything appears the same).
Since there are now four components, instead of the standalone `docker` binary, there are respectively four binaries: `docker`, `docker-containerd`, `docker-containerd-shim`, and `docker-runc`. If you are on the host machine, you can grep for docker processes using `ps ax | grep docker` and you can see these running. Below, the Docker 3rd birthday example voting app is running and if you grep for all docker processes from the host machine, you can see the aforementioned binaries.
If you’re using Docker for Mac/Windows you can run `docker run -it — pid host -v /:/hostfs — net host alpine chroot /hostfs` and run `ps ax | grep docker` in this container to get the running processes. — pid host makes it so the container uses host’s PID namespace and similarly, — net host uses the host’s UTS namespace. For more information on run, look at the Engine reference.
If you look inside of /var/run/docker/libcontainerd you can see all of the containers you have running and the docker-containerd sock file.
Networking was improved as well. Engine 1.10 added an embedded DNS server allowing each container to map container names and network aliases to IP addresses. When multiple containers had the same network alias, the DNS server would return one stable record. In Engine 1.11, the DNS server now returns all records, in random order. This lets you use DNS round robin as a basic load balancing and failover mechanism. If you ping the net-alias multiple times, you could have a wide variety of results. You could have traffic all on one container, evenly balanced, or unevenly balanced. Remember: container name resolution only works on custom networks (created with `docker network create …`). See below for an example of creating a network and two NGINX containers running on that network with a shared alias.
Additionally, networks (and volumes) can now have labels like with images.
There were several improvements made to Compose including the ability to read environment variables from an env file instead of needing to pass them through via command line.
Next, with `docker-compose up` it’s parallel where possible and dependency order is still respected. For instance if you look at a Docker compose file with redis and you know that you can start the database, front end, and worker once redis is started, then they are started at the same time.
Also there were a few changes or additions to the commands for `docker-compose`. Two new commands were added: `docker-compose up — build` and `docker-compose exec`. People were often running `docker-compose build` and then `docker-compose up` when editing Dockerfiles so to solve this issue a ` — build` flag was added to `up`. The other command, `exec`, has the same functionality as it does in Docker Engine. Additionally, `docker-compose logs` now mimics `docker logs`: instead of displaying the entire logs of the container and then stream them, it will only display them. You will have to use `docker-compose logs -f` to stream logs, like with `docker logs`. `docker-compose logs` is now able to detect when you add new containers to your applications, and will automatically add their logs to the stream when using `docker-compose logs -f`.
Swarm 1.1 added experimental support for container rescheduling and in 1.2 it is no longer experimental. Before swarm 1.1 if you used Swarm and started up a cluster of 10 servers running 100 instances of a web front end and one of the services crashed, nothing would happen. Now containers are able to be restarted upon node failure. You do this by setting an environment variable or label on a container so it gets monitored at startup.
`docker run -d -e reschedule:on-node-failure <image>``docker run -d -l ‘com.docker.swarm.reschedule-policy=[“on-node-failure”]’ <image>`
The Swarm Manager, which keeps track of the nodes, continuously checks for a heartbeat from each node and if it comes back as unresponsive it will go and try restarting it. If that node was running any container with a rescheduling policy, then the container is rescheduled somewhere else. The status can be checked via logs for the Swarm Manager and there can be many managers.
Previously if you deleted images in your own registry it would delete the logical representation of the image, but it would still leave all of the data. It just wasn’t referenced by anything. If you think about programming, when you delete a variable, it doesn’t immediately delete the data; there is memory management and it later is manually freed or there is garbage collection. This is now done on the registry.
Docker for Mac and Windows
Last month Docker publicly announced Docker for Mac and Windows with a limited beta. Docker for Mac and Windows don’t change anything under the hood, but they improve the user experience immensely. They provide a very similar feel to running Docker natively on Linux and you no longer need VirtualBox unless you’re trying to provision multiple hosts. For my take on Docker for Mac which I wrote shortly after the release see my blog post, “Say Hello to Docker for Mac”.
There have been a few new features since the Docker 1.11 release. In Docker for Mac, as of Beta 9, localhost is used for port forwarding instead of docker.local which gives it more of the intended native linux feel. Beta 10 made it so token validation is now done over an actual SSL tunnel (HTTPS). And Beta 11 upgraded the kernel and Compose. See the release notes for all of the new features, changes, and well known issues for both Mac and Windows.