How to Make Sense of Distributed Processing With Python Daemons

Daemonize and scale your Python apps

Timothy Mugayi
Jan 28 · 10 min read
Photo by NESA by Makers on Unsplash

The term “daemon” was coined by the programmers of MIT’s Project MAC. It comes from Maxwell’s demon — an imaginary being from a thought experiment, who constantly works in the background, sorting molecules. This lead to Unix systems using the term.

In Unix speak, a Daemon is a long-running background process that can perform virtually anything, from executing requests for services to performing any, usually long-running, arbitrary tasks for day-to-day activities on UNIX systems. As opposed to conventional applications, daemons do not run under the direct control of an interactive user. Since daemons do not have a controlling terminal, they run in the background silently. The term daemon originated in Unix and most operating systems use daemons in one form or another. In Unix, the names of daemons conventionally end in “d”. Some examples include inetd, httpd, nfsd, sshd, named, and lpd.


Reasons to Daemonizing Your Python Code

  • To have long-lived code execution that’s not part of your main program
  • To introduce detached distributed computation without the need for subprocesses that depend on your main application life cycle.

Daemon Characteristics

$ ps -aux | less$ top

In Windows, you can use the task manager to view the list of processes running. The task manager is opened by selecting Ctrl+Alt+Delete and then selecting Task Manager. On the Processes tab, select Details to see the PID. Try executing the below Python code in your Python terminal:

You’ll notice a PID is displayed, indicating the current Python session. Close your Python terminal by entering the exit() function call. Run the code again and see what happens to the PID.

Every process has a parent process (the initial kernel-level process is usually its own parent). The parent is notified when the child terminates and the parent can obtain the child’s exit status.

In addition to having a process ID, each process belongs to a process group.

  • A process group is a collection of one or more processes (usually associated with the same job) that can receive signals from the same terminal.
  • Each process group can have a process group leader, whose process group ID equals to its process ID.
ps -axj
  • The -a option shows the status of processes owned by others.
  • The -x option shows processes that don’t have a controlling terminal.
  • The -j option displays the job-related information:

Creating a Python Daemon


Prerequisites

Let's walk through how to create a daemon manually, without leveraging a third-party library. The only other prerequisite for this is to have the requests Python package installed.

$ mkdir daemons$ cd daemons $ touch daemon_func.py$ pip install requests

Let's create a simple python function. A fictitious news aggregator that pulls the latest top news feeds based on US headlines and writes them to a file every few seconds. The generated file format is as follows:

pid_{pid}_my_daemon_log_{timeref}.txt

This function will be the basis for illustrating how we create a Python daemon. Ensure you generate your own API key. To get your own API key, create a free account on https://newsapi.org and add your key to the code snippet below:

Take note that all import statements have been included within the main function for convenience and code isolation. Execute the above function to make sure everything is working as expected.

Let's create the main function that will be responsible for daemonize our main() function

Forking a daemon on Unix requires a certain specific sequence of system calls.

For starters, Daemon processes must detach from their controlling terminal and process group. This is achieved through a process called forking. When we fork something a copy of itself is created. In Unix we need to fork twice, thus terminating each parent process and letting only the grandchild of the original process run the daemon’s code.

In the above code, this is done by invoking the os.fork() function. This allows us to decouple the daemon process from the calling terminal so that the daemon process can keep running (typically, as a server process without further user interaction, like a web server, for example) even after the calling terminal is closed. Any code added after seconding forking will be executed within the background as a daemon. Executing our code would produce the output below. Note the PID numbers from the parent process to the detached process:

This is the parent PID 39813
Detaching from parent environment
Detached Daemon PID 39815
executing daemon background......

The call to os.setsid() creates a new session. The process becomes the leader of a new session and a new process group, and is disassociated from its controlling terminal.

The call to os.chdir(“/”) changes the current working directory to the root directory. The current working directory inherited from the parent could be on a mounted file system. Since daemons normally exist until the system is rebooted, if the daemon stays on a mounted file system, that file system cannot be unmounted.

To kill your daemon open up your terminal and enter the command kill -9 {PID}

If you’re on a mac, you can type [CMD]+ [space], search for monitor and then manually find the PID and force exit the process. If you run the program multiple types, be sure to kill all the Python processes running your daemon code.

One thing I want to remind you. Creating child processes using a Python multiprocessing module and setting process to daemon=True does not create a Unix detached daemon, as in the example code below. They have different characteristics. Additionally, a daemon is not a service.

Daemon multiprocessing module

The Python multiprocessing module Daemon flag executes a process in the background when the daemon property flag is set to true. The key difference is the daemon process will continue to run as long as the main process is executing. It will terminate after finishing its execution or when the main program is killed. It's easy to assume that using this property achieves the same thing.

Our initial code, with the double forking magic, looks simple enough to integrate with your own Python projects but to the trained eye, it’s missing a lot.

For starters, we don’t have proper PID state management in place. There’s no way to stop or restart the daemon in the event of exceptions. There’s no way to know the state of background daemons. There's no way to know how many demons have been invoked and are executing. There's no way to change the fact that only a single occurrence of the daemon can be invoked. Based onthe pep-3143 specification there are certain conditions and behaviors that should be performed when turning a program into a well-behaved Unix daemon process.

Let's take a look at two libraries that will make it easy to daemonize your Python applications.


#1. PEP 3143 Standard Daemon Process Library

$ python-daemon==2.2.4

Let's create a new file call daemon_cxt.py. The bare minimal code you need to use the lib is as follows:

A DaemonContext instance represents the behavior settings and process context for the program when it becomes a daemon they are many more arguments that can be passed on to the context. To keep this article simple, we won't dwell too much on them. Let's circle back to our original example and wrap it with our original newsapi main function.

Append below code to your daemon_func.py file

function to daemonize

Let's create a new function that starts the daemon:

start daemon script

Next, let’s create a Python main function, so we can invoke the daemon:

daemon main function

Try running the program by entering the Python command:

python daemon_func.py

What happens now? Just like our first example, N number of files will be generated every ten seconds. Take note of the PID — you can use that to kill the running process. If you wanted to automate this process of killing you can perform a simple process kill programmatically, using the code below. You can make it part of the argument parameter check with conditional if statements or a switch to decide whether to start or kill a daemon.

What if you wanted to spawn multiple daemons so you can have some form of distributed parallelism? The key would be to ensure a unique PID file is created for each spawned daemon. In essence, you can scale up the total number of daemons and take advantage of the total number of CPU and cores your server or computer might leverage on the handy multiplatform lib, psutil.

>>> import psutil
>>> psutil.cpu_count()
8
>>> psutil.cpu_count(logical=False)
4
>>>

#2. Sleepy Daemon

To install sleepy daemon you can do so by invoking pip install directly from the git repo to ensure you always get the latest code:

$ pip install git+https://github.com/kevinconway/daemons

Let's create a new Python file and append the sleepy daemon code below:

$ touch daemon_sleepy.py

Since we’re using sys.argv in this example, we can pass start, stop and restart as arguments in our script. Optionally, we can also rewrite the code so that the daemon script is executed programmatically.

sleepy daemon example execution

Windows Daemons


Supervisor

Supervisor is meant to be used to control processes related to an application and starts like any other program at boot time. One of the cool things about Supervisor is that it ships with a simple built-in web interface to help you manage processes. Programs intended to be run under Supervisor should not daemonize themselves. Instead, they should run in the foreground. They should not detach from the terminal from which they’re started.

Without digressing too much and turning this piece into an hour-long tutorial, stay tuned for part two where I’ll address Windows daemons. I’ll dive deeper into how to invoke the Windows service from our Python code programmatically, to achieve the same results as daemonization of your Python functions on Unix systems, taking advantage of your CPUs and cores for parallelism. We will also explore Supervisor and see how that works.


Conclusion

daemon function with callback handler

Say you have some long-running data science or CPU intensive computation that you want running in the background — you can design your code to accept a callback function and focus on writing custom code to handle the daemon results. Daemonization logic will be abstracted away within your library, letting developers focus on what they do best — writing code.

I hope this material has been helpful in getting you up to speed and has cleared up any doubts you may have.

Updated 06/02/2020 part two below

Better Programming

Advice for programmers.

Timothy Mugayi

Written by

Tech Evangelist, Instructor, Polyglot Developer with a passion for innovative technology, Father & Health Activist

Better Programming

Advice for programmers.

More From Medium

More from Better Programming

More from Better Programming

More from Better Programming

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade