My process became PID 1 and now signals behave strangely

Aaron Kalair
HackerNoon.com
8 min readDec 27, 2017

--

Or let's write our own init process

When your process runs as PID 1 in a Docker container, signal handling behaves differently to what you might expect.

First lets sanity check what happens when a process is not PID 1 on a “normal” system.

A simple Python process that just sleeps

And if we run it and send SIGTERM

It gets terminated, nothing surprising here

And now let’s run it as PID 1 in a Docker container

Run this container, exec in and then send the same signal

And now nothing happens!

Lets try this with a Go process that does something similar

Pop this into a Docker container, run it, exec in and send it SIGTERM

And it’s killed, just like it behaves if it wasn’t running as PID 1

So what’s going on here then?

Well PID 1 is special in Linux, amongst other things it ignores any signals unless a handler for that signal is explicitly declared. From the Docker docs — https://docs.docker.com/engine/reference/run/#foreground

Note: A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. So, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.

We could just define handlers for those signals in every process we want to run in a Docker container but this is a lot of work and we may not have the source code to do so. Furthermore there are other responsibilities for PID 1 that we’ll explore later.

So instead we could run a different process as PID 1 and have it proxy signals to the actual process we want to run and perform the other duties of a standard init process

There are numerous solutions that do this for example

Yelps dumb-inithttps://github.com/Yelp/dumb-init

Tini which is shipped with Docker— https://docs.docker.com/engine/reference/run/#specify-an-init-process

And many more which you can find by searching around.

But I’m going to write my own…

So let's start with the basics I need a program that takes the name of another process to execute and executes it

Some important things to note about how we do this because it will be important later.

After we Start() the new process we call Wait() this is important, this will block until the command exits and once it does cleans up any resources associated with it.

Failure to wait on a process you spawn leads to zombie processes that hang around once they’ve finished executing consuming some resource.

From the man page — http://man7.org/linux/man-pages/man2/waitpid.2.html#NOTES

A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes.

So let's try out our new signal proxy, if we run that in a container…

We can see that our proxy process is now PID 1 and has spawned off sleep-spawner

Alright the next step is to register ourselves as being interested with all the possible signals

With sigHandler defined as:

It simply switches on all the signals Go supports — https://golang.org/pkg/syscall/#pkg-constants

And then uses the killsystem call to send the signal through to the process that’s being ran.

Now let's use it to run our Python program and see if it handles SIGTERM correctly.

And it works!

Now let’s take care of another thing PID 1 is responsible for, cleaning up Zombie processes.

Imagine this scenario

A — spawns -> B — spawns-> C

Now if B dies or exits before C, C becomes an orphan process, who is C’s parent now?

Well the operating system is responsible for reparenting orphan processes to PID 1, so it now looks like

A — parent of -> C

Now when C exits A will receive the SIGCHILD signal and is responsible for calling wait on C to clean up this Zombie process.

So lets add this logic to the SIGCHILD case:

-1 Means wait for any child process to change state rather than a specific one as we don’t know the ID of the process that has exited when we get the signal

WNOHANG Means that if there are no child processes that have changed state don’t block waiting for one, return immediately

Performing wait on a terminated child cleans up its resources preventing it from remaining a zombie process

From the wait manpage — http://man7.org/linux/man-pages/man2/waitpid.2.html

In the case of a terminated child, performing a wait allows the system to release the resources associated with the child; if a wait is not performed, then the terminated child remains in a "zombie" state

Now there’s just one more case to handle imagine:

A — spawns -> B — spawns -> C

Now C exits but B doesn’t call wait on it

A — parent of-> B — parent of-> C (defunct zombie process)

wait Only works on child processes so no matter how many times our init process A called wait it wouldn’t clean up the resources C was using. (And note that SIGCHILD would only be sent to B so A wouldn’t even be aware of C exiting)

Now B exits A recieves SIGCHILD calls wait and B is cleaned up nicely.

C is now an orphan that gets reparented to A so we have

A — parent of -> C (defunct zombie process)

We can see the above in action with some modifications to our sleeping program to produce processes where parents exit before there children and don’t call wait

It’s available on Github here — https://github.com/AaronKalair/sleep-spawner

And if we run this we can see what the process tree looks like:

With our current implementation this will remain the situation forever, so we need to modify it slightly to handle cases like this:

We take advantage of the return value of wait4 when used in combination with WNOHANG to call it in a loop every time we get a SIGCHILD signal.

Again from the man page (wait4's return value conforms to waitpid — http://man7.org/linux/man-pages/man2/waitpid.2.html )

on success, returns the process ID of the child whose state has changed; if WNOHANG was specified and one or more child(ren) specified by pid exist, but have not yet changed state, then 0 is returned. On error, -1 is returned.

So we can sit calling Wait4 until we get a return value less than or equal to 0 knowing that it’s cleaning up exited processes.

Now if we run this and exec inside the container and check with ps

We can see that the zombies parented to PID 1 have now been cleaned up!

And there we have it, we’ve made a basic init process that lets us send signals to processes running in Docker containers and have them behave the same way they would outside of a container, and the ability cleanup zombie processes!

See the full source code here — https://github.com/AaronKalair/init-proc

Follow me on Twitter @AaronKalair

--

--