Avoid running NodeJS as PID 1 under Docker images, when running them on Mesosphere, Kubernetes or any other orchestrator

Or, how we came up with using a tiny init system for app runners

Eugene Obrezkov

Published in

Eugene Obrezkov

4 min readFeb 20, 2018

Why is that?

Turns out that NodeJS is not able to receive signals and handle them appropriately (if it runs as PID 1). By signals, I mean kernel signals like SIGTERM, SIGINT, etc.

The following code wouldn’t work at all if you run NodeJS as PID 1:

process.on('SIGTERM', function onSigterm() {
  // do the cleaning job, but it wouldn't
  process.exit(0);
});

As a result, you will get a zombie process which will be terminated forcefully via SIGKILL signal, meaning, that your “clean up” code will not be called at all.

So what, you might say. I’ll describe a real case.

Where does this occur?

At my work (elastic.io), we are using Mesosphere and Kubernetes as our orchestrators. When Mesos\Kubernetes decides to kill the task, the following is happening.

Mesos sends SIGTERM and waits for process to die for some time. If that has not happened, it will send SIGKILL (which is force kill of the task) and marks the task as failed task. The same flow is applied to Kubernetes.

If you have NodeJS application that listens for RabbitMQ messages, and you will not close all the listeners on SIGTERM, it will continue listening and is not going to close the process -> SIGKILL arrives to do the job.

Since, our platform relies on statuses returned from Mesos\Kubernetes, we make falsy assumptions about state of the task, bringing to us unknown issues and wrong behaviour of the platform. We never wanted to have unexpected behaviour, did we?

What best practices say about PID 1 case?

Node.js was not designed to run as PID 1 which leads to unexpected behaviour when running inside of Docker. For example, a Node.js process running as PID 1 will not respond to SIGINT (CTRL-C) and similar signals. (reference)

Boom!

Imagine, you have an app written in NodeJS, which is doing some job as a daemon on Mesos\Kubernetes, waiting for the signal to kill it.

You have listeners for SIGTERM, so you can close all the connections daemon uses and notify that everything is ok with exit code 0.

But, it would not. A NodeJS app even is not able to understand, that someone wants to close it, so it just continues to work, waiting for SIGKILL signal to come and make a massacre.

What is the explanation from UNIX perspective?

I found a great explanation in this article.

But there is a special case. Suppose the parent process terminates, either intentionally (because the program logic has determined that it should exit), or caused by a user action (e.g. the user killed the process). What happens then to its children? They no longer have a parent process, so they become “orphaned” (this is the actual technical term).
And this is where the init process kicks in. The init process — PID 1 — has a special task. Its task is to “adopt” orphaned child processes (again, this is the actual technical term). This means that the init process becomes the parent of such processes, even though those processes were never created directly by the init process.

And, of course, NodeJS is not designed to be the init system. So, that means, any of our applications must be run under some init process, which will spawn our app under itself or will become a parent of such process in the future.

What is the solution? How did we fix the problem? How can we propagate kernel signals to our app?

Docker init

You can solve the issue by simply adding flag init when running Docker images:

docker run --init your_image_here

It will wrap your processes with tiny init system, which will leverage all the kernel signals to its child and make sure that any orphaned processes are reaped.

Well, it’s ok, but what if we need to remap exit codes? For instance, when Java exits by SIGTERM signal, it will return exit code 143, not 0.

When reporting the exit status with the special parameter ‘?’, the shell shall report the full eight bits of exit status available. The exit status of a command that terminated because it received a signal shall be reported as greater than 128. (reference)

Docker init is not able to handle such cases. That’s how we found our ideal solution to these cases — Tini.

Tini

Tini is the simplest init you could think of. All Tini does is spawn a single child (Tini is meant to be run in a container), and wait for it to exit all the while reaping zombies and performing signal forwarding. (reference)

With the recent release we were able to remap exit code 143 to 0, so we can run our Java and NodeJS processes under Docker with the following command:

ENTRYPOINT ["/tini", "-v", "-e", "143", "--", "/runner/init"]

Epilogue

That way we’ve fixed all the issues related to processing the kernel signals in our applications, so they are able to handle them and respond.

As a bonus, we got the ability to remap exit codes in cases, if child process responds with (128 + SIGNAL). I.e., where application got SIGTERM (code 15), in some cases it will be 143 (128 + 15), which means normal exit from the process.

Hope the article helps you to find some unexpected behaviour in your applications.

Follow me on Medium (the button below the post), you can ask any questions and get in touch with me on Twitter.

References

Eugene Obrezkov, Senior Software Engineer at elastic.io, Kyiv, Ukraine.