An Init System inside the Docker Container
In this blog, we’ll understand the PID 1 zombie reaping problem and how it can be solved with the help of init system inside the docker container. Though we have the ideal fact “one process one container” but most often we end up running few processes inside the container such as logging and monitoring agent.
As of today Docker does not run processes under a init process that properly reaps child processes, so it is indeed possible for the container to end up with zombie processes that cause all sorts of trouble.
It is the top-most process started by the kernel when the system is booted. The init process is responsible for starting the rest of the system, such as starting the SSH daemon, starting Apache/Nginx, etc. Each of them may in turn spawn further child processes.
Each process can spawn child processes, and each process has a parent except for the top-most process.
A child that terminates, but has not been waited for becomes a “zombie”. The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child.
The action of calling
waitpid()on a child process in order to eliminate its zombie, is called "reaping".
Why Zombie processes are harmful
As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes.
Relationship with Docker
Well, we see many folks run only one process in one container and they think that when they run this single process, they’re done. But most likely, this process will not behave like a proper init process. In other words, instead of properly reaping adopted processes, it expects another init process to do that job and rightly so.
Let’s look at a concrete example. Suppose you run a web server inside the container which runs bash script. Now, this bash script calls perl. Then the web server decides that the script is taking too long and kills the script, but perl is not affected and keeps running. When perl finishes, it becomes a zombie and is adopted by the PID 1 (the web server). The web server doesn’t know about perl, so it doesn’t reap it and the perl zombie stays in the system.
We see many folks often create Docker containers for third party applications (MongoDB, MySQL, Apache) and run their applications as the sole process inside the container. You’re running someone elses code, so you can’t be really sure that those applications don’t spawn processes in such a way that they become zombies later.
So, to avoid this, there is a need to run a proper init system inside the docker container.
A simple init system is to use bash inside the docker container.
CMD ["/bin/bash", "-c", "set -e && /path-to-your-app"]
Though, bash properly reaps the adopted child processes, it doesn’t handle signals properly. Suppose that you use
kill to send a SIGTERM signal to bash. Bash terminates, but does not send SIGTERM to its child processes.
docker stopsends SIGTERM to the init process. "docker stop" should stop the container cleanly so that you can start it later with "docker start".
When bash terminates, the kernel terminates the entire container with all processes inside. These processes are terminated uncleanly through the SIGKILL signal. Suppose your app is running busy writing a file; the file could get corrupted if the app is terminated uncleanly.
Unfortunately, sending signals to child processes is not enough. The init process must also wait for child processes to terminate, before terminating itself. If the init process terminates prematurely then all children are terminated uncleanly by the kernel.
So clearly a more sophisticated init system (running with PID 1 inside container) is required to solve the below issues:
- Inherit orphaned child processes and must reap them
- Handle signals properly
- Waits until all subprocesses are terminated before terminating itself
There are many init systems available for the containers, such as dumb-init, supervisord and others.
Disclaimer: Content and Image source has been mentioned. Special Credit to concerned folks.