What means Docker in production today ? An overview from scratch

Thomas Hilaire
Linagora Engineering
21 min readNov 9, 2015

This article helps to discover Docker through the most “in production” important points. It begins with some generalities and Docker basics, then the security and the impacts on software architectures, and to finish the tooling and some exploitation specifics operations. I also give my feeling about the switch to Docker to run your production services as a conclusion, take it like that :)

History

2012 Docker is an evolution of a tool that the dotCloud company was using internally

2013 March dotClout decides to release the 0.1 version as an open-source tool

2013 July Docker becomes the top priority of dotCloud

2013 September RedHat starts to collaborate on Docker

2014 January Closes 15M$ Series B funding

2014 March Releasing the libcontainer, LXC is not used by default anymore

2014 April Creation of the Docker Governance Advisory Board

2014 September Closes 40M$ Series C funding

2014 Oct/Dec Microsoft and IBM announce that they will integrate Docker in some of their products

2015 April Third funding, getting 95M$ Series D

2015 June Docker and other companies want more standardization about container technologies

Licenses

Docker Engine, Registry, Compose, Machine, Swarm and all open-source products are published under the license Apache v2.0 of January 2004.

More information about commercial products at https://www.docker.com/components-licenses

Why the Rocket fork ?

The CoreOS editors want to work with standards and reusable components to have the best interoperability with the rest of the world. As Docker has been chosen by the Linux distribution for containers management, CoreOS was actively contributing to Docker (and continue today) and so their CTO “Brandon Philips” takes a seat on the DGAB. But paths chosen by Docker to grow don’t live up with CoreOS expectations for several reasons:

  • Docker is more a platform than a tool as there isn’t any separation of concern of its features
  • The manifest on standardization has been removed from the Docker sources

Therefore, CoreOS lunches the Rocket “rkt” project to make the security, image building, image distribution, etc to be separated from the tool responsible for the container runtime. The goal is to give more power to alternatives of the community. We can also imagine that CoreOS doesn’t want to have their business sealed with the Docker’s one.

It seems that Docker and Rocket are friends again, they are now working together with many other companies on the “open container initiative”. They have already worked on the runtime aspect and have released “runC” as a Linux Foundation project.

Now we will get closer to Docker, let see what it is made of.

Working with the Engine

To understand strengths and weaknesses of the Docker Engine in a production context, I will try to explain few basics points that I think relevant.

  • Execution under an isolated environment
  • Server — Client
  • Network
  • Images
  • Repository
  • Security
  • Impacts on software architecture

Execution under an isolated environment

Docker can be understood as a layer upon Linux container technologies. LXC was the bridge used to enjoy Kernel functionalities like namespaces or cgroups for a while. Developers have abstracted this piece of Docker to make it able to use other providers like their libcontainer (now used by default), OpenVZ, libvirt, systemd-nspawn, qemu/kvm and LXC.

It’s important to understand that, opposing to virtualization technology, containers do not multiply the operation system count running on the physical machine. As described by the schema below, all containers will share the same OS, saving some resources that can be used by your business applications.

Server — Client

The Docker server (daemon) listens on a Unix socket (unix:///var/run/docker.sock) by default and expose a REST API on it. By this way, you can easily interact with the daemon, this example shows how to list images known on your machine.

The complete documentation of this API can be found there.

Thanks to the Docker client, you have not to know that as it will make the requests and format the responses for you. Now, the same command using this tool:

You can configure Docker to make it listening on a HTTP socket instead of the Unix one. Then the access can be secured by protecting it with certificates. Thus, we can automate some tasks without having the client installed on an integration server for example, to start/stop a container or to retrieve logs.

Network

When the Docker server starts, it creates a virtual network interface named “docker0”. Then it will create another interface “veth” for each running container. So all containers on a same machine can communicate through this private network. This system makes impossible for containers on distinct hosts to communicate together easily.

To let the outer world reaching a container, a port of this container must be exposed on the host, then the public IP of the host must be used. This method brings many constraints and make the industrialization of container deployment hard as:

  • A host cannot have two containers which want to expose the same port. e.g. Multiple “Apache” containers on the same host can’t be attached to the port 80.
  • An application must know the IP address of the remote host running the container sought.
  • A container knows neither its host’s IP nor its “port mapping”. So it cannot share to the world how it can be reached (like writing in an etcd server).

Images

An image is a template from what Docker is able to create containers. Many existing images are shared on the official repository (“debian”, “jenkins”, “postgresql”, etc). We can also create images by our self in two ways, always by starting from an existing image:

  • by saving the file system state of a running container with the command docker commit
  • or by playing a bunch of instructions written in a Dockerfile

Dockerfile usage is widespread as it allows sharing an image just as a regular text file. It also permits to easily understand what an image really does, by reading all steps that will build it. Below is an example of how a Nginx HTTP server image can be written as a Dockerfile. To build it run: docker build -t my-nginx /chemin/vers/mon/Dockerfile/

Each image can be extended. By following this example, you can change the my-nginx behavior by creating a new Dockerfile with its name in the FROM.

A container can be instantiated from an image by using docker run my-image. If my-image isn’t yet known by your Docker, a search in the repository will be triggered to download it.

Repository

Once built, there are different ways to share an image on a repository:

On a SaaS

At home or in a private cloud

  • free of charge by using the “registry” container provided by the Docker team. Some regular features can be used with few configurations as basic authentication, certificates, notification, …
  • with subscription on Docker Trusted Registry (currently starting at 150$/month). This product provides a web console to visualize its health state, statistics on hosted images, role access control, enterprise directory integration, …
  • with subscription on CoreOS Enterprise Registry (currently starting at 10$/month). The product seems similar to Docker Trusted Registry in some ways, it provides a web console to navigate between images, handle role access control, …

Security

Container capacity restrictions

The capabilities is a feature provided by the Linux kernel, and allows restricting rights of a container against the kernel. The root user of a container only has rights declared in the execdriver template used at its boot time, so this user will not have the same rights as your real host root. You can see this template on GitHub, current default rights are CHOWN, DAC_OVERRIDE, FSETID, FOWNER, MKNOD, NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP, NET_BIND_SERVICE, SYS_CHROOT, KILL, AUDIT_WRITE.

With this feature you can for example, remove the KILL capacity for all users in a container if you are sure to never need it. The default template is quite restrictive, but you can reduce even more the power of an attacker in case of he arrives to get access to the container.

Compatibility with other security tools

Docker does not interfere with other security tools. So GRSEC, TOMOYO, AppArmor, etc can be used beside Docker. RedHat even provides an execdriver template compatible with SELinux.

Flooding based attack

This kind of attack would make a service unusable just by saturating it with a huge quantity of requests.

The kernel provides another feature named “cgroups” which allows grouping processes, monitor them and give statistics on the resources used. It also permits to limit consumption, to be sure that a container will not take all CPU, IO and memory of the physical machine.

Password management

It seems to be a good idea to not store passwords nor certificates in images. Firstly for confidentiality reasons, secondly because the image cannot be reused between different environments (dev, testing, production).

Some tools have been created to fulfill this common need like Vault or Keywhiz. With them, critical elements can be discovered by the container itself in a secure ways.

Virtual machines to host containers

Notably thanks to specific hardware (like Intel VT-d and VT-x), the virtualization brings a better isolation than containerization. Even if the namespace feature isolates well processes from each other, the host kernel still shared, so an attack at kernel level in a container can pick up to the host.

So hosting containers into virtual machine enhance the isolation and permits admins to limit damage in case of successful attack. Try to use the right dosage of container-per-VM to keep flexibility and the benefits of Docker.

See « Combining Containers and Virtual Machines ».

Trusting an image

Since the version 1.8, Docker gets its « notary» tool enhanced. A publisher can now “sign” an image, then users downloading it can verify that it is the right one by using this sign. Unfortunately, this tool does not aim to protect you against a publisher that provide a hazardous image.

Updates policy

As it runs through the host kernel, a container gets benefits from it but also weaknesses. That’s why it’s essential to keep hosts security components up-to-date as much as possible.

Because CoreOS has an evolved update process, choosing it as the operating system for hosts make sense. I will explain the particularities of this process later in this article.

Impacts on software architecture

At first glance, running an application inside containers or in virtual machines change nothing about its architecture. Even if they were not designed for, we can create a Postgres cluster or a Cyrus Murder with Docker containers. But we will see there some good practices to get all benefits that containerization can provide. Most of these practices are well explained on http://12factor.net but some are more related to containers.

Immutable Infrastructure

A container should not have any internal state, we must be able to destroy then restore it on another host without loosing any data. That’s the “Immutable Infrastructure” concept which implies that you would think your file-system as read-only, your application should be stateless, etc. All non-volatile data must be persisted in a database or in a mounted volume that can be shared in by your hosts cluster.

Dependency discovery

In order to ease the deployment automation and repeatability, you should avoid any manual configuration step for your applications. They should be able to discover themselves their stuff. Such dependencies can be passwords, certificates, IP or name of a remote service, or even simple configuration entries.

Vault, Etcd, Consul or Zookeeper are like dictionaries built to store such elements. They expose simple HTTP API to access data, some of them even offer a notification system to make your applications easily always up-to-date about such configuration stuff.

Micro-services

A monolithic application is easy to conceptualize. At the beginning, we can quickly code wished features, but we observe that along the time, the amount of code and the responsibility level become really hard to manage effectively. Then, a simple evolution need can be a too important source of regression, so manual tests take longer and longer to run.

To develop an application as micro-services, we must think about the way to cut it as good-sized pieces, related to your business logic.

Coding with micro-services brings flexibility because :

  • responsibilities are shared, so we are able to find and fix a bug in a more isolated way
  • as the micro-service scope is small, much less manual tests have to be run after any change
  • we can choose a different language or framework for each micro-service
  • scaling an application can be done in a smarter way, as we are able to replicate only loaded micro-services and not the whole application

We are now aware that take Docker to run applications may change your software architecture and can lead to complex networking management. So the following section is about the Docker tooling.

“Docker in production” toolbox

Production often involves usage of many machines, unfortunately the Docker Engine scope is only bound to a host. We will see some tools which help to go in production with Docker, but we won’t talk about plenty of them as it can take a life! So I suggest you to take a look by yourself to tools like Helios from Spotify, Centurion of NewRelic, Marathon from Mesosphere or even Chef (for Docker). Here I will only introduce:

  • Docker Compose, Machine and Swarm
  • Weave
  • Flannel
  • CoreOS
  • Kubernetes

Docker Compose, Machine and Swarm

The Docker suite to manage deployment and networking between machines still in beta. It’s not advised to use it for production yet.

Docker Compose

Previously known as « fig », Compose enlarge the scope of Engine to work with multiple containers.

Previously known as « fig », Compose purposes almost the same functionalities that Engine about image and container management. But it allows us to work with groups, so it becomes useful when your application runs on multiple containers. It hasn’t any notion of host to explain where a container should live, as later this role will be done by the integration with Swarm.

We can describe how our application is composed in a docker-compose.yml, then Compose will do the job against the Engine to make your services up.

Available commands are quite basic: up, start, stop, restart, rm, kill, etc. Running the up command against the docker-compose.yml above can be translated as regular Docker Engine commands:

Docker Machine

This tool helps creating machines ready to run containers. It can work with some “machine providers” thanks to its driver system. You can use the Virtualbox provider for development or testing, and OpenStack, VMWare Fusion or Amazon EC2 for production.

Docker Machine is like the hypervisor of the virtual machine world. For example, you can create a Virtualbox VM with the command docker-machine create — driver virtualbox dev-db-1. Other commands permit managing its machines, find their IP, update their Docker Engine, reboot or delete one of them. Aside the integration with Engine and Swarm, the current roadmap of this tool is almost blank.

Docker Swarm

We have seen that we can create and manage your machines inventory with Docker Machine, Swarm aims to exploit them. Swarm abstracts the aspect of machines for an extensible service capable to run containers. Addressing no solution today about the networking of containers hosted on different machines, the interest of Swarm still really limited for large deployment.

However, I strongly suggest you to watch its evolution because, as the official deployment tool of the Docker company, its weaknesses should be fixed soon and it might become a great tool. Below is a summary of Swarm components:

Master: It receives all commands from the administrators and reflects changes on its nodes.

Node: It’s a physical or virtual machine where containers will live. Each node runs a Docker daemon which is managed by the master.

Agent: A Swarm agent is installed on each node, it listens the Docker daemon activities to update the “discovery service”.

Discovery Service: Like a dictionary keeping the Swarm cluster composition up-to-date. It knows which node is running which container, and helps the master to apply smart strategies when it wants to deploy new containers into the cluster. For example, the default policy would deploy on the node having the fewer containers. Currently the role of “discovery service” can be done by Docker Hub, Consul, Zookeeper and Etcd.

Weave

The Weaveworks company has three tools targeting three Docker gaps. Weave Net for the network, Weave Run for the service discovery and Weave Scope for cluster visualization.

Weave

Command line tool which is able to start « Weave Router » and « Weave DNS ». It enables the virtual Weave network on the host and is responsible about assigning IPs to newly created containers.

Weave Net, the router

It’s a Docker container having the router role and listens on the port 6783 of the host. It knows how to connect to another node to route packets to a remote container.

Weave Run, the DNS

Another Docker container helping to make the « service discovery ». Each time that a new container starts, this “dns” is notified and then link its Docker name to its IP address. Thus, the new container can be reached by its name by every application of the Weave cluster.

In the example shown above, it is possible to resolve the address http://hello-world from the “Curl Container”.

See http://blog.weave.works/2014/11/04/have-you-met-weavedns/

Weave Scope

Weave Scope is a web application providing visualization of your Weave cluster. It shows basic container properties, links and some network information between containers, other host OS related data like its hostname, its load, …

Note: The Weave network overlay add few CPU load, it also reduces a bit the bandwidth and increase the latency. If such drawbacks have no impact on most applications, it can be significant for others which require really high quality connection with the lowest latency between its nodes (like many NoSQL databases).

This overhead comes from the software UDP encapsulation done through the “pcap” library of exchanged packets on the virtual network. However, the Weave team is working on this weakness and want to switch on other technologies like the “Open vSwitch Datapath”. This enhancement stays under development, but results are promising. http://blog.weave.works/2015/06/12/weave-fast-datapath/

Flannel

As Weave, Flannel aims to create a private network to ease the communication between containers on different hosts. Developed by CoreOS, it is the network component which manage the network part of their operating system. Convenient by its design and efficient thanks to its modern technology, it has been chosen by Google for Kubernetes. Below is its big picture that we can find on their GitHub page.

Configuration

We can see on the previous picture that an agent “flanneld” is installed on each machine which run containers. When this agent starts, it will read the Flannel cluster configuration from an Etcd server, then become responsible of a range of this network.

The configuration entry in Etcd is really short and easy to understand:

Docker integration

We can provide a custom CIDR setting to the Docker daemon when it starts using the — bip option. By this way we can handle the IPs given to newly created containers. The MTU (Maximum Transmission Unit) can also be customized to enjoy the best performances that Flannel can give.

Once the daemon has been configured, all its containers will be on the private Flannel network.

Packets encapsulation

Counter to Weave, Flannel already supports multiple encapsulation techniques like udp, vxlan and host-gw. It also supports some specific cloud services, the aws-vpc technology can be used if you use Flannel on Amazone EC2 or gce on Google Compute Engine.

Note: Supporting the VXLAN encapsulation, the Flannel network offers great performances. We can find on the web some tests trying to measure it. If the bandwidth seems almost not altered, we note a light latency overhead. However, it’s so small that it should be invisible for most cases. You can get more details about this in these articles: http://www.generictestdomain.net/docker/weave/networking/stupidity/2015/04/05/weave-is-kinda-slow/, http://packetpushers.net/vxlan-udp-ip-ethernet-bandwidth-overheads/, http://www.slideshare.net/ArjanSchaaf/docker-network-performance-in-the-public-cloud

CoreOS

CoreOS is a Linux distribution based on Gentoo. Devoid of any « package manager », its only goal is to host containers. A CoreOS installation is really light, only few services are shipped by default like locksmith, docker and rocket. It’s through the deployment of tools into containers that a CoreOS instance will get more stuffed.

The security is an important point of CoreOS, its update process is done continually as a background task. The root partition uses an active/passive schema, CoreOS always boots on the active one and updates are always done on the passive one. There is a role switch when the system reboot (or by using a specific command) so you take benefits of the latest improvements.

  • A CoreOS update is not a package by package change set, it’s a new version of the whole system as a single unit
  • Multiple update channels are available, production machines will be configured on the “stable” channel whereas developers machines might be on “beta” or “alpha”
  • If anything goes wrong after an update, you can simply ask for a roll-back to get back on previous CoreOS version

Kubernetes

Sometimes called “K8s”, Kubernetes is a container deployment tool made by Google. Successor of “Borg” and “Omega”, this tool is taking an important role into the huge cluster of Google, so much that they want to create through it a “Cloud Native Computing Foundation”. Kubernetes abstracts containers for a higher level concept, to make you managing “services”. By this way, you can delegate all the bothering and complex orchestration stuff when you need to scale or update your applications. Administrators will define contracts, Kubernetes will take care of them by managing containers on its machines. I’ll briefly explain below some components which permit this abstraction.

Kubectl: Command line interface tool to administrate your Kubernetes cluster.

Master: It receives effective commands from a kubectl, supervises the cluster and give tasks to its nodes.

Node: A node is a physical machine or a VM, it hosts “pods” and receives instructions from the “master”. A “kubelet”, a “proxy” and Docker are installed on each node.

Kubelet: A “kubelet” is an agent which listen Docker activities to be aware of containers state. For example, that allows it to restart a dead container.

Pod: With Kubernetes all containers run into “pods”. You can compose a multi-containers application by defining a pod, the tool will assert that every container of a pod will run on the same machine and so, they can share more easily resources.

When a pod is instantiated, Kubernetes will automatically chooses a healthy node which has enough free resources to host it.

Replication controller: Pods can be created manually or through a “replication controller”. We can roughly see these controllers as the combination of a pod to a replication factor. Thanks to them, Kubernetes will guarantee the existence of the number of pod wanted on the cluster. Like a kubelet, such a controller can automatically redeploy pods. A common use case is for example, when a physical machine falls.

Replication controllers can be used for:

  • make your application highly available, just by setting the replication factor at 1
  • scale an application, by increasing or decreasing the replication factor
  • rolling upgrades

Service: Take the example of a micro-authentication service named “my-auth” that you want to install, in a distributed manner on your cluster. By declaring this “service”, Kubernetes allocates a fixed IP (called “cluster IP”) and will be exposed to both the environment variables of pods and to its DNS. Thus, applications that wish to interact with this authentication service can simply use its name, https://my-auth/authenticate for example.

A service is something virtual which is linked to real applications, represented by the pods. This association is made via the “labels” of pods. When you create a service, you can give to Kubernetes a kind of “link request” like: The “my-auth-prod” service is composed of all the pods that have the labels “auth-app” and “production”.

Such virtual services make Kubernetes able to give really interesting points to node proxies:

  • No need to know pods names or locations or how many pods are involved to make an application. You’ll only use the fixed “cluster IP” of the service, instead of a pod IP that can change over the time.
  • Proxies will route incoming requests automatically to pods, following affinity policies or for load balancing purpose.

We can also define a service as “public”, thus Kubernetes will expose a port for it on each cluster node. Then your service will be reachable from the outside world!

Even if many points are really specifics for each tool, you now imagine more easily how Docker containers can be structured and deployed in a production context.

Specifics of operations

Images and host disk space

The images are downloaded to the host but will not be deleted automatically, so you have to pay attention to the available disk space to clean it when it’s required.

Snapshot

With Docker you can reproduce the well known “snapshot” system that you use to do with VMs. But there are alternatives that fit more the container mood.

By creating archives

Docker permits saving a container file-system state as an archive through the save and export commands. Both have almost the same behavior, save keeps the image layers and export will flatten the image to have only one layer. Such archives can then by restored by the load or import commands.

By a high level file-system

One aspect of the Docker philosophy is that a container can’t have state kept internal, to be destroyed and recreated on demand and without data loss. To achieve this, we often use the volume Docker feature to share folders between the host and its containers. The non-volatile data of an application must then be written to the volume. The advantage of this method is to use a file system that knows natively make snapshots of incremental backups.

Since Docker 1.8 this functionality has become very modular and supports advanced file system as Blockbridge, Ceph, ClusterHQ, EMC and Portworx.

Logs

When you realize your developments, you can write logs and exploit them in the same way as before. No matter that your application will be executed in a container or a virtual machine.

But in principle, a container is made to perform a single task, host a single service. That is why it is common to write the logs from a container on the standard output (stdout and stderr). You can often encounter this kind of method if you use images published on the Docker hub. For such case, you still able to centralize logs with the — log-driver Docker option. For example, if you are used to working with syslog, it is possible to transfer the output of a container to your “central syslog” server.

Supported drivers list is maintained at https://docs.docker.com/reference/logging/

Conclusion

Although its ecosystem sometimes seems very young, Docker is based on proven technologies that are part of the kernel for several years. Manage a container-based network is a daily challenge for administrators, but once mastered, the benefits can be considerable. While most developers already use Docker daily on their machines, it is unclear who actually uses it for production and at which scale. But this is surely a matter of time that the phenomenon takes even more momentum, thanks to many IT actors that are actively working on its popularization. Among them IBM, RedHat, VMWare; even Microsoft announces supporting Docker in the next version of Microsoft Server 2016. However, is Docker a wise choice for everyone?

Even if recent companies have automated end to end their processes effectively and without a container drop. I’d clearly say yes for startups and software as a service vendor, their application is often “cloud ready” as they have been developed with modern tools and with the practices that fit to Docker. They were able to make their application needs growing alongside their “Docker ecosystem“ learning curve.

That’s not so easy for more traditional groups, who still see gaps in Docker and toolbox that are blocking for them, such as lack of visualization tool or managing access-rights to containers. Moreover, that would mean to transform the way that they are tested, packaged, hosted or shipped. The transition to Docker really makes sense if you manage to exploit it fully. You can not simply replace VMs by containers, you must rethink your methodology, your applications and your infrastructure, bet on the right tools with trained teams.

--

--