Introduction to application containerization

Muhammad Aditya Hilmy
HMIF ITB Tech
Published in
8 min readApr 1, 2019

What if you want to deploy multiple applications but you only have one server? Short answer, use containerization.

Containerization in a nutshell

Application containerization is a method of running an application in an isolated runtime in the same OS, sharing the same kernel.

Normally, if you have a server at your disposal and you want to run a LAMP stack, you’d install a LAMP stack and just run it as a service. It will bind to the desired ports, say 80 and 443, and run in the background to serve clients’ request. The configuration for that stack will be kept in a file system inside the OS. It runs as a process.

Now, what if you want to run two separate LAMP stacks? You can’t just install the second LAMP stack like you install the first one (well, at least you can’t install it again by typing sudo apt install httpd the second time). Even if you can, managing them would be a hassle. You’d have to LAMP stacks sharing the same root folders, and without proper permission control, you can have those two stacks interfering with one another.

Say hello to application containerization.

Application containerization allows those separate stacks to be isolated from each other, in a unit known as container. They can bind to the same ports, have their own root directory, and manage their own permissions. They can operate as if they are the only LAMP stack in their own OS, but they’re not.

What’s the difference with virtualization?

In a nutshell, containerization sounds a lot like virtualization. Both containerization and virtualization make the LAMP stack feel like they’re alone. But the fundamental difference is how they are layered.

Virtualization vs Containerization. Source: https://freeloadbalancer.com/docker-container-free-load-balancer/

Both containerization and virtualization needs a host server and a host OS. The difference is what runs on top of the host OS.

  • In virtualization, a hypervisor runs on top of the host OS, while in containerization, a container engine runs on top of the host OS.
  • Hypervisor runs a full-fledged OS on top of it, while containerization engine runs containers on top of it.
  • OS running on top of hypervisor has its own kernel, while containers share kernel with its host OS.
  • Both virtualization and containerization uses an image, but the size of an container image is much smaller than OS image.

Some of the advantages of containerization are:

  • Smaller image size, hence the image can be stored more efficiently.
  • Faster startup time, since they are relatively less complex than a full-fledged OS.
  • It averagely uses less computing power than an OS, making it an efficient and cheaper alternative.

However, containerization has some drawbacks:

  • It shares a single kernel, hence it is not as secure as virtualization,
  • The containers must maintain compatibility with the containerization engine, while virtualization can use the standard processor instructions to operate.

Why should I containerize?

Let’s revisit the question from above. What if you want to deploy multiple applications but you only have one server? As mentioned above, containerization allows multiple applications to run simultaneously in a single OS while being isolated from each other.

Imagine that you have 4 separate applications, and each application must be able to handle 200 transactions per second at its peak usage. Each application has a different peak period. See illustration below.

Illustration for the transactions per second to time plot for each application

Let’s say that a server is capable of handling 320 transactions per second. Without containerization (or virtualization, for that matter), you’ll need 4 servers for each applications A, B, C, and D. With that deployment, the computing capacity of each server is fully utilized only at the peak periods, and minimally utilized at other times. This, of course, is inefficient. High capacity servers are not cheap, and it seems like a waste of money to let high-capacity servers used minimally at most times.

What if we can pool the computing resources? Give most of the computing power to application A at A’s peak period, and during B’s peak period, reallocate A’s computing power to application B, and so on. Instead of using 4 servers, each having a capacity of 320 transactions per second, just use one server, alternately.

That’s where containerization comes in handy.

Suppose we deploy applications A, B, C, and D into identical containers. Each container is reserved to handle 10 transactions per second, so the server can handle at most 32 containers running at the same time. Initially, we deploy an equal number of containers for each of the applications, say we use 8 containers (roughly 80 transactions per second for each application). When it’s time for A’s peak period, we provision 20 containers for A, and 4 containers each for B, C, and D. Application A now has the capacity to operate, and applications B, C, and D have enough containers, since they are not running at peak capacity. The same logic applies for B, C, and D’s peak periods.

This is an oversimplification, obviously. The capacity of an application can vary depending on lots of factors, but the above explanation is suffice to explain the concept of containerization.

A container’s short lifespan

In the example above, the number of containers for each application can vary over time. It means that containers can be created and destroyed at any time. That nature of containers means that they can be highly flexible, but it also means that any data persisted within the container is gone when the container is destroyed. Containers should be stateless by nature.

What if you want to run database inside a container?

In small scale deployment, most containerization engines — like Docker — allow you to mount a directory in the host OS to be accessible from inside the container. For example, you can bind /data/mysql in the host OS to be accessible from /var/lib/mysql from inside the container. If a container is destroyed and recreated, the files kept in that directory is safe. You can call this kind of container stateful.

A (very) rough illustration of how volume mount works

In large scale deployment, however, the topic deserves its own article. If you want to dive deeper, try visiting these links:

Container orchestration

We have discussed an example with only 32 containers in deployment. What if you have much more, like thousands of containers more? Managing each one of them manually is very time-consuming. You’ll need to create and destroy hundreds of containers manually, monitor each and every one of them, and if a container encounters a fatal error, you’ll need to destroy and provision a new one by hand. Even worse, what if your thousands of containers are scattered across hundreds of servers? You’ll need to keep track which server contains which containers, and which application does each container belong.

Tired of imagining how dull that job must be, software engineers came up with something called container orchestration.

Source: https://media.giphy.com/media/9UgoEE8Fl3wyI/source.gif

The job of a container orchestrator is, well, to orchestrate different containers. Two of the most popular container orchestration platforms are Kubernetes and Docker Swarm. I’ll use Docker Swarm in explaining how they work.

First of all, the physical server (or VM instance) running a Docker container engine in a cluster is called a node. A cluster of node is called the swarm (you can call it cluster, though). One or more nodes are elected as swarm manager — which is the primary orchestrator.

The smallest possible unit of work is the container. Group of containers running the exact same application (i.e. image) is called a service. You can specify how much containers should be provisioned for every service. You can also configure the application by specifying environment variables in the service. A service can also mount to a directory in the host OS, as discussed previously.

When a new container needs to be provisioned, the swarm manager finds the best node to place the container, based on the workloads of the nodes, manually-specified placement constraints, or other considerations. When a container terminates because of a fatal error, or fails to start, Docker Swarm will automatically destroy the terminated container, and provision a new one until a desired number of containers is reached. It can continue doing this until the system is stable. This is called self-healing.

Kubernetes takes this even further. It can autoscale the containers based on the workload. It can also terminate existing containers if it sees that the node hosting the container is overloaded, and move them somewhere else. There are more features and functions of an orchestrator other than the points mentioned previously.

With those concepts in mind, you can imagine that your application is floating in the sea of servers. It can move wherever and whenever it is needed — it can have no permanent home (nomads). It’s able to replicate, or be destroyed in an instant, as needed. Your application will be ready to handle the incoming traffic at moment’s notice. Swiftly and efficiently.

Source: https://giphy.com/gifs/animated-gif-processing-creative-coding-dZ79xj2ZujLna

Summary

Application containerization is a way to run your application in isolation from other applications — like a Virtual Machine — in units called containers. Containers are lightweight and can be provisioned and destroyed easily, making it suitable for deployment in a constantly-changing environment. Containers can also be managed by something called container orchestration to minimize the hassle of provisioning, destroying, controlling, and scaling containers.

References:

--

--