Docker Fundamentals — about containers

In recent years, the Docker platform has become quite popular, judging by the number of developers wearing Docker merch in public. With the rise of containerisation, DevOps teams are gravitating towards containerised microservice architectures.

This article is the first in a series on Docker, and serves as a guide to understand how Docker works.

What is Docker?

Docker is a tool that enables you to run applications in isolated environments known as containers. Containers are similar to Virtual machines, but run directly on top of the kernel.

What does this mean?

Essentially, a process running inside a container is a process running on the host, except that it is bounded by some restrictions or limits. These restrictions may be CPU time, main memory, I/O, etc.

About cgroups

To apply these restrictions, Docker uses a kernel feature known as a cgroup(short for control group). A cgroup is nothing but a group of processes that is collectively bound by some restrictions or limits, as mentioned above. There can be multiple cgroups, and cgroups can further have nested cgroups. They are arranged in a hierarchical manner. Here’s a high level example of such a hierarchy.

cgroup 1 — process A and B, cgroup 2 — process C and D

The cgroup hierarchy above has 2 cgroups, each containing 2 processes. From now on, we will refer to these as hierarchies.

Now, let’s say I have 4 processes: A, B, C and D. I want to apply the following restrictions on these processes :

  • Process A and B together should use a maximum of 10% of CPU resources. (for every x milliseconds, this group can receive only 0.1x seconds of CPU time)
  • Process C and D together should use a maximum of 50 Mb of Main Memory

The solution would be to put processes A and B into cgroup 1, with a CPU-limit of 10%, and C and D into cgroup 2, with a memory limit of 50Mb. The cgroup hierarchy would look something like this -

Simple enough, but what if i had some more complex requirements? Let’s say I needed process A and B to use a max of 10% CPU time each. Also, process A and C together must not consume more than 50Mb of memory, and process B and D together should not consume more than 1Mb/s of I/O bandwidth and not more than 100Mb of main memory.

This isn’t possible using just one hierarchy. Luckily for us, we can have multiple hierarchies at the same time. To understand how, we need to understand the concepts of cgroup subsystems and the cgroup filesystem. The kernel uses a virtual cgroup filesystem (cgroupfs) to monitor the constraints or limits imposed upon each cgroup. Each hierarchy is simply a file in the cgroup filesystem. The cgroup filesystem is interfaced by many cgroup subsystems or resource controllers. A cgroup subsystem is nothing but a module(C program) that that controls allocation of a resource (CPU time, main memory, etc.).

  • Each resource has a dedicated subsystem to handle allocation of that resource.
  • Each subsystem interfaces with a single hierarchy. No subsystem can use more than one hierarchy from the cgroup filesystem.
  • Two or more subsystems may share a hierarchy.

Let’s construct a multi-hierarchy cgroup filesystem for the above example.

There are 2 hierarchies. Hierarchy_2 is shared by two subsystems. No subsystem interfaces more than one hierarchy.

So far, we have established that groups of processes can be subjected to certain limits or constraints using Linux cgroups. Now visualise a container running on your host machine. Let’s say this container has two processes A and B running inside. Using what we learned earlier, we can infer that docker would place processes A and B (running on the host) into a cgroup with some restrictions on memory, CPU, I/O, etc. That is how docker allows processes to run in a containerised environment, directly on top of the kernel, no hypervisor required :)

We’re by no means done yet though, because there is a problem. Consider this scenario: Process A is running inside the container, and process B is running outside the container. Process B performs its operations in small units of work. It computes a partial result, writes it to a file ‘/tmp/foo’ before performing some preprocessing for the next stage, and then proceeds to read the partial result from /tmp/foo and uses it for the next bit. It may happen, by accident or by design, that process A writes to the same file /tmp/foo before process B reads it for the next stage of its task. As you probably guessed, this is a vulnerability that can cause all sorts of problems.

Namespaces

To get around this problem, docker and services like it use another kernel feature called a namespace.

A namespace allows the resources of a process to be isolated and virtualized.

A namespace for a resource is like a cage for the processes belonging to that namespace, where the processes cannot access any resource outside that namespace. For example, the mount namespace for a process specifies the mount points that are visible to that process. You may have figured out that this solves the above problem. We could have our own mount points for each process. We could mount a specific folder ‘/foo/bar/fizz/buzz/’ on the host as the root directory in the container. Now, process A cannot access /tmp/foo if its mount namespace has the root directory mounted on some dedicated folder. If you're familiar with Linux, you might have noticed that this is similar to chroot.

There are a few different kinds of namespaces —

  • Mount namespace : the one we just saw; it allows one or more processes in the namespace to have their own mount points independent of other processes.
  • Process namespace : processes in the namespace cannot inspect processes outside. e.g PID 5 inside a namespace may be PID 1064 for root. The process will only be able to inspect processes that are in its own namespace.
  • Network namespace : allows isolation of network devices, routing tables,etc.

Congratulations, we’ve established the fundamental concepts that make containers work.

Containers vs VMs

A quick run-down of the advantages and disadvantages over VMs —

Advantages:

  • Processes run directly on top of the kernel as opposed to VMs where there is some sort of hardware or software emulation (known as a hypervisor) that translates interactions between the VM and the host.
  • Very small image sizes compared to VMs.
  • Multiple levels of nesting is possible with very low overheads as opposed to VMs.
  • Can be started up and staged very quickly.

Disadvantages:

  • Security concerns — with containers, there is a possibility of a system being compromised. For example, if there is a kernel panic due to a process in a container, the host and all containers on it will crash.
  • It is not possible to run different kernel configurations or implementations for different containers — since all container processes are essentially running on the host kernel, we can’t have different kernel configurations for different containers. For such a use case, it’s better to use VMs.

All in all, containers make for a very useful tool when deploying a microservice architecture. In subsequent articles, we will look at actually implementing a containerised application. Stay tuned!

References -