Multiprocessing in Python on Windows and Jupyter/Ipython — Making it work

Gaurav Singhal
3 min readNov 21, 2018

--

This post is most useful if you are using Windows and Jupyter/Ipython, or atleast one of them.

Have you ever come across the situation where you want to speed up your code and were too afraid to try multiprocessing. Or you tried and your fears came true and nothing worked. Or your took another step and found out that windows does not support forking and child processes can’t be distinguished from parent processes, so you need to include an `if__name__ = '__main__' clause and you tried that and it still did not work, then you came across #TextTooDifficultToUnderstand and finally gave up. If all or any of above are true, you can see this post to get going.

Question: Why it does not work ?

Answer: Both Jupyter and Windows are to blame here, I will not use this space to put details, but at the end you can find few links which explain it better

Question: How to make it work?

Answer: We just need to do two simple modifications to make it work. But before that, below is how it is done on linux without using Jupyter or Ipython. Probably this is the first thing you came across (or will come across). Basically It consists of two steps: First, create a function, and then use multiple processors to execute the function in parallel.

#import Pool
from multiprocessing import Pool
#Define a worker — a function which will be executed in parallel
def worker(x):
return x*x
#Assuming you want to use 3 processors
num_processors = 3
#Create a pool of processors
p=Pool(processes = num_processors)
#get them to work in parallel
output = p.map(worker,[i for i in range(0,3)])
print(output)

This code should work perfectly fine on linux, but, If you run it on python shell on Windows (cmd → python), you will get an error like this Can't get attribute ‘worker' on <module ‘__main__' (built-in)> However if you run it on Jupyter, it will be stuck forever and never complete the processing.

All you need to do is:

  1. Define your worker in a separate .py file and import it.
  2. Add a if __name == '__main__'clause before calling your worker

Suppose you save the code in workers.py, so it will look like this:

def worker(x):
return x*x

And just import this file in Jupyter and use workers.worker with an if clause to make it work. So your Jupyter code will look like this:

from multiprocessing import Pool
import workers
if __name__ == '__main__':
num_processors = 3
p=Pool(processes = num_processors)
output = p.map(workers.worker,[i for i in range(0,3)])
print(output)

Let me know if this still does not work for you. I have used python 3.6.5.

What Next:

In my next post: Speed up your code using multiprocessing in python , I will show how multiprocessing can actually improve the performance, using a very simple but useful example.

Details of why simple linux version does not work on Windows/Jupyter and get stuck forever:

  1. Why multiprocessing does not work in Jupyter or Ipython or any other interactive shell: https://stackoverflow.com/a/23641560/4613606
  2. Why multiprocessing does not work on Windows without the if clause: https://stackoverflow.com/questions/20222534/python-multiprocessing-on-windows-if-name-main

Become a member: https://medium.com/@grvsinghal/membership

--

--