In Python, choose builtin process pools over custom process pools

multiprocessing module provides support for parallelism based multiprocessing in Python. Unlike thread-based concurrency, this module enables the use of multiple CPUs to perform parallel computations.

A common use of multiprocessing module is to map data — apply a function f to every data item in a collection.

There are two options to accomplish this with multiprocessing module.

  1. Create custom process pool using multiprocessing: Create multiple Processes that apply f to a given data item and are connected to the main process by two queues, push input data to the processes via an input queue, and collect output (mapped) data from the processes via the output queue.
  2. Use builtin support for process pool in multiprocessing: Create a Pool of processes and use method to map a collection of data items by applying f.

In the code snippet below, these options are realized via custom_pool and builtin_pool functions.

Often, f may depend on some fixed data that does not change with the data item being mapped. Instead of providing such fixed data along with each data item to the processes, the processes can be initialized once with the fixed data and it can be used with each data item.

In the above code snippet, such initialization is accomplished in the non-pooled option by having two input queues: one for initialization (lines 31, 36–37, 39, 16–17) and one for mapping, and in the pooled option by using the initialization support available in Pool class (lines 82, 64–69).

The above two options are identical:

  • 1 master process uses 4 slave processes to map 100 lists each containing 100 one's to an integer and sum these integers (lines 44, 46-51, and 86).
  • The mapping and summing is repeated 9 times (lines 45 and 85).
  • The above two steps are performed for varying sizes of fixed data (lines 28, 30, 79, and 81).

So, when executed, we would expect both options to exhibit similar performance. But, that isn’t the case (with python3.6 on a MacBook Pro).

As the fixed data changes from 100K integers to 158M integers, the performance of the builtin pool option changes very little (almost linear) while the performance of the custom pool option changes drastically (clearly non-linear). Comparatively, the builtin pool option was at times ~10 times faster than the custom pool option.

Further, an exception occurs during the executing of custom pool option.

python-multiprocessing$ python3.6 builtin_pool
big_data 100,000 ints : 0.038294 seconds per iteration 99990000
big_data 630,957 ints : 0.048417 seconds per iteration 99990000
big_data 3,981,071 ints : 0.048765 seconds per iteration 99990000
big_data 25,118,864 ints : 0.058857 seconds per iteration 99990000
big_data 158,489,319 ints : 0.117942 seconds per iteration 99990000
python-multiprocessing$ python3.6 custom_pool
big_data 630,957 ints : 0.066467 secs per iteration 99990000
big_data 3,981,071 ints : 0.089601 secs per iteration 99990000
big_data 25,118,864 ints : 0.398886 secs per iteration 99990000
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/", line 240, in _feed
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/", line 398, in _send_bytes
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
big_data 158,489,319 ints : 1.427352 secs per iteration 99990000

The number 99990000 in the last column is the result of the map operation on the last iteration. Since this number does not change, none of the results in the last iteration were affected by the exception. However, it is unclear if and how the exception affected the results in intermediate iterations.

If you want to use multiprocessing module in Python to write data parallel code, then try using the builtin support for process pools. If it does not fit your purpose, then roll out your own process pools. And, always keep measuring.

If you liked this post, then you might like this one about parallel programming in Python.