Understanding Multiprocessing with Python

Regan Willis
3 min readAug 2, 2020

--

Multiprocessing is using more than one computer processor so that instead of running the program sequentially (like usual), the program will have multiple processes that run in parallel. This can be a useful solution to speed up your program. Multiprocessing is more complicated in Python than other programming languages because of the Global Interpreter Lock. Luckily, as is typical for Python, we have libraries to help us! We’ll use standard libraries multiprocessing and concurrent.futures. You can view the source code here.

Creating Multiple Parallel Processes

If you’ve been researching this already you might have heard the word “pool” thrown around. This is a common way to implement parallel processing — creating several different processes that all work at the same time. You can do this with the regular multiprocessing library, but I think the concurrent futures library is a little more intuitive, and it has the ability to switch from threads to processes without much change in syntax.

The code below is an example that functions as two concurrent threads with one thread being the main program and the other holding a pool of processes. This means that all the computation in the pool is done in parallel, but it is not in parallel with the main thread. The main thread loops through to look for qualifying data and, when enough processes have been created, the main thread switches to the process pool and starts computing in parallel.

To test that the processes are working in true parallel, I made each process sleep and all the processes slept at roughly the same time. Overall, the processes sleep for 13 seconds, so if this program was being run sequentially it would take over 13 seconds to complete all the processes. However, because we are using multiprocessing, sending all the data points takes less than five seconds. The third process takes a little longer, so instead of the program waiting on it, it will go on to complete all the other processes and the third process will finish last.

Creating a Spin-Off, FIFO Idling Process

The problem with the pool is that although processes may be created in a first-in, first-out manner, there is no way to guarantee that the processes end in order. Usually, this is an advantage: if the fifth process is taking a little longer it’s no problem because the sixth and seventh processes can still keep going. However, you may find yourself in the position where a function can be done asynchronously, but the order of the output of that function matters. In this case, you need only one process that will spin off and work in parallel with the main thread. Here’s an example using the multiprocessing library:

The main thread is constantly looping, while sending some of the data to the function for it to do some time-consuming computation on it. The multiprocessing library includes a queue that can be sent as an argument in a function. From the main thread, you can put data in the queue. Calling the get method will remove the data from the queue in the order it was sent. The process will start as soon as you call the start method, but the queue’s get method will, by default, wait until there is something in the queue before continuing the program. After the first piece of data is sent in the queue, you can loop through it until a flag is sent to stop the loop.

--

--

Regan Willis

programmer + artist, interested in fostering human connection through computers. twitter: @regancodes