Speed up your code using multiprocessing in python

Gaurav Singhal
3 min readNov 22, 2018

In my previous post: Multiprocessing in Python on Windows and Jupyter/Ipython — Making it work, I wrote about how to make python multiprocessing work when you are using the devil combination of Jupyter and Windows. In this post, we will see how it actually speed up your code. Here is a brief review of previous post:

Review: We need to define a worker (a function), which we want to execute in parallel, and then we simply use it. Just that worker needs to be in a separate .py file if you are on Jupyter. Then to use it on Windows, you need to put the pool statement under an if clause. the pool statement is basically used to assign number of process, as explained in the previous post.

Here is a worker which create and sort an array of a given size, which we store in defs.py:

import numpy as npdef createandsort (n):
rand = np.random.RandomState(42) #Give a seed to reproduce results
a = rand.rand(n) #Generate an array of size n
return a.sort() #Sort the array

Then we just use it:

from multiprocessing import Pool
from timeit import default_timer as timer
import defs
#Create sizes for 3 arrays.
sizes = [10**1 for i in range(0,3)] #Size of each array is 10 here.
#Applying the function sequentially
tic = timer()
[defs.createandsort(size) for size in sizes]
tac = timer()
print("time for sequential sorting: ", tac-tic)
#Using multiprocessing
if __name__ == "__main__":
pool = Pool(processes=3)
tic = timer()
pool.map(defs.createandsort,sizes)
tac = timer()
print("time for parallel sorting: ",tac-tic)

If you run the above code, you will see something like following:

time for sequential sorting:  0.00013748130595558905
time for parallel sorting: 0.6173960894770971

What! doing things in parallel are 4000 to 5000 times slower than doing them sequentially. What is wrong here? Actually nothing, the extra time is due to overhead in invoking all the processors, assigning different tasks to them and then gathering the results. Since the array size is only 10, creation and sorting take very less time. So lets try it on bigger arrays.

#Create sizes for 3 arrays. 
sizes = [5 * 10**6 for i in range(0,3)] #Size of each array is 5 million here.
#Applying the function sequentially
tic = timer()
[defs.createandsort(size) for size in sizes]
tac = timer()
print("time for sequential sorting: ", tac-tic)
#Using multiprocessing
if __name__ == "__main__":
pool = Pool(processes=3)
tic = timer()
pool.map(defs.createandsort,sizes)
tac = timer()
print("time for parallel sorting: ",tac-tic)

When each array is of size 5 million, we get the following result:

time for sequential sorting:  1.494535938597437
time for parallel sorting: 1.2176664572234586

So, the parallel version has slightly outperformed the sequential version. What about even bigger arrays, say 100 billion.

time for sequential sorting:  36.15379052735625
time for parallel sorting: 14.171565365067181

So parallel version is almost 3 times faster than the sequential version, something which we were hoping for.

What Next:

I hope to write a post with more practical application of multiprocessing.

Notes:

  1. If you are using linux, you will not need to define the worker in a separate file and will not need a if __name__ = '__main__': clause.
  2. I have used only 3 processors, you can use multiprocessing.cpu_count() to count your number of processors and use them accordingly.

Become a member: https://medium.com/@grvsinghal/membership

--

--