Creating & troubleshooting Zombie processes in Python

Krishna Prasad
Naukri Engineering
Published in
5 min readJul 30, 2020

At times you may have noticed few processes that are lying around and can’t be killed. These processes are zombie processes. Before discussing how to troubleshoot & create a zombie process, let’s discuss what a zombie process is.

Zombie processes: as the name suggests are the dead/unresponsive processes which do not have any activities or are completely unresponsive to their surroundings. In Linux terms, a zombie process is a process that has finished its execution but is still waiting for its Parent Process to read its exit signal status code, during this period of time child process becomes a zombie process. So if the parent process is busy or has some bug that makes it unable to read the child process exist signal code, in such a scenario the child process becomes a Zombie process for infinite time or till the parent process exits and it reaps the child process from the process table.

Let’s create a program, when we spawn a child process and as per the above definition, let’s exit the child process and add asleep for some time in the parent process so that parent process is unable to read the child process exit signal code as described below:

  1 import os, sys, time
2
3 ttlForParent = 60;
4 for i in range(0, 10):
5 pid_1 = os.fork()
6 print("Hello Worlds!!!")
7 if pid_1 == 0:
8 sys.exit();
9
10 time.sleep(ttlForParent);
11 os.wait()

What could be the output of the above program?

The code after line #5 will be executed twice (once by a child process created at line #5 and second by the parent process), and from line #4 i.e. for loop, will be spawned by that number of times (twice for i = 0, twice for i=1 and so on ….for all i, total it will be 20 times here.

Since each os.fork() return 0 process id for the child process, so from line #8, each child process finishes its work and exits by a system call sys.exit(). Once all the child processes creation and exit are finished, the parent process is still in busy mode (line #10) and is not able to capture/read the exit signal code of all the child processes, during this period of time all child processes become Zombie.

Linux/Unix Command to find out the list of Zombie processes

As demonstrated above, how the zombie processes are created, let’s deep dive into Linux command to list out the zombie processes. One of the easiest ways to know that system has any zombie process is to do the top command of Linux and check the count of Zombie in task line as below:

Tasks: 439 total,   1 running, 358 sleeping,   0 stopped,  10 zombie

Here, we have 10 zombie processes. Let’s explore another command to find out more about it, like process id, parent process id etc.

As we know that the Zombie process state is Z, so let’s grep the ‘Z’ in the ps ax command of Linux:

krishna@krishna-HP-ProBook-440-G5:/zombie$ ps ax | grep 'Z'
5409 ? Sl 0:00 eog
5445 pts/5 Z+ 0:00 [python3] <defunct>
5446 pts/5 Z+ 0:00 [python3] <defunct>
5447 pts/5 Z+ 0:00 [python3] <defunct>
5448 pts/5 Z+ 0:00 [python3] <defunct>
5449 pts/5 Z+ 0:00 [python3] <defunct>
5450 pts/5 Z+ 0:00 [python3] <defunct>
5451 pts/5 Z+ 0:00 [python3] <defunct>
5452 pts/5 Z+ 0:00 [python3] <defunct>
5453 pts/5 Z+ 0:00 [python3] <defunct>
5454 pts/5 Z+ 0:00 [python3] <defunct>
5459 pts/6 S+ 0:00 grep --color=auto Z
krishna@krishna-HP-ProBook-440-G5:/zombie$ ps -ejH | grep 5445
5445 5444 16799 pts/5 00:00:00 python3 <defunct>
krishna@krishna-HP-ProBook-440-G5:zombie$ ps -ejH | grep python3
5444 5444 16799 pts/5 00:00:00 python3
5445 5444 16799 pts/5 00:00:00 python3 <defunct>
5446 5444 16799 pts/5 00:00:00 python3 <defunct>
5447 5444 16799 pts/5 00:00:00 python3 <defunct>
5448 5444 16799 pts/5 00:00:00 python3 <defunct>
5449 5444 16799 pts/5 00:00:00 python3 <defunct>
5450 5444 16799 pts/5 00:00:00 python3 <defunct>
5451 5444 16799 pts/5 00:00:00 python3 <defunct>
5452 5444 16799 pts/5 00:00:00 python3 <defunct>
5453 5444 16799 pts/5 00:00:00 python3 <defunct>
5454 5444 16799 pts/5 00:00:00 python3 <defunct>
krishna@krishna-HP-ProBook-440-G5:/zombie$

As we can see, that there are 10 processes with state Z+ from the command ps ax | grep ‘Z’, for more details about the parent process we could do ps -ejH | grep 5445(pid) by the process id or ps -ejH | grep python3 by process name found by the command ps ax | grep ‘Z’. For details about the list of parent process id, we could also use awk with ppid and -ostat as demonstrated below:

In the above code, we got the list of parent processes or the details of zombie process.

What could we do now with these processes? Does the zombie process consume resources of the system?

Before answering the above question, we should be aware of the python code line os.exit() in Python. os.exit is a low-level system call that exits directly without calling any cleanup handlers. In the above program, it is called in the child process, such that the parent program execution does not end and child thread execution will be finished and exited. When a process ends via os.exit(), all of the memory and resources associated with it are deallocated so that they can be used by other processes. However, the process's entry in the process table remains, and as demonstrated above it shows in the process list by various commands. Thus, it does not really consume any resources of the system and we should not care much about it. However, like real zombies, they become more troublesome when they are in large numbers or numbers growing over time. A large number of zombie processes indicate either a system issue or an application issue depending on the source of the processes.

How to kill the Zombie Processes?

As the name suggested that it is already dead so default kill (default signal -15 (SIGTERM)) command of the Linux on the zombie process id does not work (as it is already dead). Even kill -9 i.e (SIGKILL) will also not work. As we know the Zombie process is already exited and the exit status code is not read/captured by the parent process so killing the parent process will help to clean-up the zombie process from the process table. But caution should be taken before killing the parent process, before killing the parent process, it should be debugged by strace or lsof Linux command in order to know more details of the parent process. The parent process should also be killed with default signal i.e -15 (SIGTERM) as SIGTERM will tell the parent process to exit cleanly which is more likely to read the exit status code of the zombie children.

But what happens if the parent process does not exist in the process table and only zombie processes exist. Advice is simple as we see above the zombie processes do not take resources of the system and if it is in less number and does not multiply quickly or spawn to the large number; at this point, you can leave the zombie processes on your system, or you can simply reboot. A Zombie process whose parent is no longer active is not going to be cleaned up without rebooting the machine.

Hit the 👏 (claps) button to make it reachable to more audience.

--

--

Krishna Prasad
Naukri Engineering

Data Engineer in a FinTech company, Skills: Apache Spark, Apache Hudi, Delta Lake, Scala, Core Java. Email: kprasad.iitd@gmail.com