Parallelising in Python (mutithreading and mutiprocessing) with practical templates

Motivation

LU ZOU
Python experiments
5 min readJan 16, 2019

--

In a recent project, we were required to shorten code run times, as such few interesting methods were investigated, namely multithreading and multiprocessing. Here we provide some sample codes and brief explanations on how to get your code running with these techniques. We hope this can save you some time searching for solutions.

In python, multithreading and multiprocessing are popular methods to consider when you want to parallelise your programmes. The two methods and their differences are well explained in this article.

The performance was estimated based on the Windows system:
- Windows 10, 4 cores, 16GB RAM, x64-based processor

# automatically display running time of a cell
%reload_ext autotime
# output results of multiple statements in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = “all”
# load libraries
from multiprocessing import freeze_support
from multiprocessing import Pool
from functools import partial
import threading
import math
import pandas as pd
import numpy as np
from random import randint
from time import sleep

Suppose the task is to determine whether a number is a prime number. The function below requires an integer as an input and output whether it’s a prime number as True or False.

# define a test function that determines whether or not an integer is a prime number
def is_prime(n):
if (n < 2) or (n % 2 == 0 and n > 2):
return (n, False)
elif n == 2:
return (n, True)
elif n == 3:
return (n, True)
else:
for i in range(3, math.ceil(math.sqrt(n)) + 1, 2):
if n % i == 0:
return (n, False)
return (n, True)

We applied three methods to set up the baseline performance: looping, mapping and vectorizing. They are similar in terms of running time.

# set the maximum range to detect prime numbers in (0 to num_max-1)
num_max = 1000000
# looping
%timeit np.array([is_prime(x) for x in list(range(num_max))])
#mapping
%timeit np.array(list(map(is_prime, list(range(num_max)))))
# vectorizing
%timeit np.vectorize(is_prime)(list(range(num_max)))

Looping: 27.1 s ± 475 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Map: 26.2 s ± 667 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Vectorizing: 26.6 s ± 363 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Multithreading

For the prime number task, the threading does NOT work with functions with RETURN command. In order to extract the results, we need to append it from each thread. We can do this because the threads share the same memory space, which is more challenging using multiple processors, because processors use separate memory spaces.

The running time is much longer (> 2 mins) than the looping.

# mutithreading version
def func_thread(n, out):
out.append(is_prime(n))
x_ls =list(range(num_max))
thread_list = []
results = []
for x in x_ls:
thread = threading.Thread(target=func_thread, args=(x, results))
thread_list.append(thread)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
# time: 2min 58s

Why does it take much longer?

In Python, the real multithreading cannot be achieved because of the global interpreter lock (GIL). This lock prevents multiple threads from executing codes at the same time.

However, Python still provides the Threading library. When should we use it then? Although one thread is running one at a time, the waiting time is optimised by the library between the threads. The following example will demonstarte time saving by running multiple threads.

The function does no calculation, just waits for a random time from 1 to 10 seconds. Applying this function 10 times in a single thread, it took over 50 seconds; while using multiple threads, one doesn’t need to wait until the sleeping time is ended on another thread, so the running time is shortened to under 10 seconds.

However, if the task is highly numerically intensive, then no benefit can be gained using Python multithreading. In contrast to the sleeping task, the prime determination is computational. There is little waiting time on the disk IO.

def sleeping(number):
# Sleeps a random 1 to 10 seconds
rand_int_var = randint(1, 10)
sleep(rand_int_var)
# print("Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds\n")
def sleep_thread():
thread_list = []
for i in range(1, 10):
# Instantiates the thread
t = threading.Thread(target=sleeping, args=(i,))
# Sticks the thread in a list so that it remains accessible
thread_list.append(t)
# from the main-thread, starts child threads
for thread in thread_list:
thread.start()
# main-thread 'sleeping' in join-method, waiting for child-thread to finish
for thread in thread_list:
thread.join()
%timeit np.array([sleeping(i) for i in range(1, 10)])

Time: 52.5 s ± 11.9 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sleep_thread()

Time: 9.44 s ± 728 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Mutiprocessing

First of all, mutiprocessing cannot be run interactively. Functionality within this package requires the __name__ == “__main__” part on Windows machine. This is to make sure that the main module can be safely imported by a new Python interpreter, e.g. by a child process. One clear explanation can be found here.

For the prime number task, we wrap the following codes in a .py file, and either run the file as a whole in interpreter or run it in the terminal.

python prime_mutiprocessing.py

It takes under 10 seconds to run the scripts using 6 processors; it shortens the time by more than a half compared to looping.

Mutiprocessing time: 6.412 seconds.

# prime_mutiprocessing.py

import time
import math
from multiprocessing import Pool
from multiprocessing import freeze_support


'''Define function to run mutiple processors and pool the results together'''
def run_multiprocessing(func, i, n_processors):
with Pool(processes=n_processors) as pool:
return pool.map(func, i)


'''Define task function'''
def is_prime(n):
if (n < 2) or (n % 2 == 0 and n > 2):
return False
elif n == 2:
return True
elif n == 3:
return True
else:
for i in range(3, math.ceil(math.sqrt(n)) + 1, 2):
if n % i == 0:
return False
return True


def main():
start = time.clock()

'''
set up parameters required by the task
'''
num_max = 1000000
n_processors =6
x_ls = list(range(num_max))

'''
pass the task function, followed by the parameters to processors
'''
out = run_multiprocessing(is_prime, x_ls, n_processors)

print("Input length: {}".format(len(x_ls)))
print("Output length: {}".format(len(out)))
print("Mutiprocessing time: {}mins\n".format((time.clock()-start)/60))
print("Mutiprocessing time: {}secs\n".format((time.clock()-start)))


if __name__ == "__main__":
freeze_support() # required to use multiprocessing
main()

Summary

In a piece of recent work, running time was a real constraint. After vectorization where possible additional time saving approaches were investigated. Mutiprocessing was found to greatly reduce are execution time and some working examples are given above.

Hopefully this guide allows you to save some time implementing multiprocessing in your next project.

--

--