Threading vs Multiprocessing in Python

One of the hottest discussions amongst developers I have ever found other than the slow execution speed of Python is around problems with threading and lot of them complaining about GIL ( Global Interpreter Lock). Some even go to the extent of saying that Python and other languages are not truly “concurrent” and cannot scale well.

In my earlier blog I discussed in details about first problem which explains concept of executing Python code at speed of C (https://medium.com/@hitechpundir/execute-python-code-at-the-speed-of-c-extending-python-93e081b53f04).

This paper will attempt to solve second problem in detail.

While concurrency defines the problem, parallelism defines the implementation. There are many solutions to achieve this including standard threading ( or pthreads) to asynchronous programming, process forking etc. Each one of these paradigms has it’s own pros and cons that have to be understood before you as a programmer can really choose which one to use.

Many people without understanding the pros and cons of each just approach the problem with what is now the near ubiquitous solution to the concurrency “problem”: Threads, and threaded programming. And this approach will be cause of your pain and complain. ( Note our approach and not language itself is problem )

Threading and GIL in Python

Both processes and threads are created within a given programming language and then scheduled to run, either by the interpreter itself (“green threads”), or by the operating system (“native threads”).

Guido says in response to a threading question:

“Nevertheless, you’re right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities.

Just because Java was once aimed at a set-top box OS that didn’t support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn’t mean that multiple processes (with judicious use of IPC) aren’t a much better approach to writing apps for multi-CPU boxes than threads.”https://mail.python.org/pipermail/python-3000/2007-May/007414.html

The GIL is an interpreter-level lock. This lock prevents execution of multiple threads at once in the Python interpreter. Each thread that wants to run must wait for the GIL to be released by the other thread, which means your multi-threaded Python application is actually single threaded. The GIL prevents simultaneous access to Python objects by multiple threads.

Now the question which comes into mind is “ if we have the GIL, and a thread must own it to execute within the interpreter, what decides if the GIL should be released?”

The answer is byte code instructions. When a Python application is executed, it is compiled to byte code, the actual instructions that the interpreter uses for execution of the application. Normally, byte code files end with a name like “.pyc” or “.pyo”. A given line of a Python application might be a single byte code, while others, such as an import statement, may ultimately expand into many byte code instructions for the interpreter.

The CPython interpreter ( for pure Python code) will force the GIL to be released every hundred byte code instructions. This means that if you have a complex line of code that acts as a single byte code the GIL will not be released for the period that that statement takes to run.

However, C extensions are exceptions.

Using Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS, you can actually acquire and release the GIL voluntarily. ( more on this can be in another post )

Py_BEGIN_ALLOW_THREADS

… Do some blocking I/O operation …

Py_END_ALLOW_THREADS

Coming back to main discussion,the fact is, the GIL does prevent you as a programmer from using multiple CPUs simultaneously. Python as a language, however, does not.

To be noted that the GIL does not prevent a process from running on a different processor of a machine. It simply only allows one thread to run at once within the interpreter.

So multiprocessing not multithreading will allow you to achieve true concurrency.

Lets understand this all through some benchmarking because only that will lead you to believe what is said above. And yes, that should be the way to learn — experience it rather than just read it or understand it. Because if you experienced something, no amount of argument can convince you for the opposing thoughts.

import random
from threading import Thread
from multiprocessing import Process
size = 10000000   # Number of random numbers to add to list
threads = 2 # Number of threads to create
my_list = []
for i in xrange(0,threads):
    my_list.append([])
def func(count, mylist):
    for i in range(count):
        mylist.append(random.random())
def multithreaded():
    jobs = []
    for i in xrange(0, threads):
        thread = Thread(target=func,args=(size,my_list[i]))
        jobs.append(thread)
    # Start the threads
    for j in jobs:
        j.start() 
    # Ensure all of the threads have finished
    for j in jobs:
        j.join()

def simple():
    for i in xrange(0, threads):
        func(size,my_list[i])

def multiprocessed():
    processes = []
    for i in xrange(0, threads):
        p = Process(target=func,args=(size,my_list[i]))
        processes.append(p)
    # Start the processes
    for p in processes:
        p.start()
    # Ensure all processes have finished execution
    for p in processes:
        p.join()
if __name__ == "__main__":
    multithreaded()
    #simple()
    #multiprocessed()

Benchmarking results:

To do benchmarking we can modify the threads parameter in file to 2 , 3 , 4 for results below. In the main function we can run either of multithreaded, simple, or multiprocessed at a time (by commenting other two as above) and use

time python threadbenchmark.py

to get time consumed. ( we should just look at real in results for benchmarking without worrying about users and sys times )

Execution time in seconds for 2, 3 and 4 threads.

                  simple    threading   multiprocessing
threads = 2       4.124      5.539       2.034
threads = 3       6.391      13.772      3.376
threads = 4       9.194      17.641      4.720

So threading is even slower than simple execution. This is understood from the behaviour of GIL discussed above and should not surprise us now.

Truly what gave performance benefit in terms of execution speed is multiprocessing. So language does not limit us at all. Of course like multithreading has to deal with issues of synchronization, deadlocks etc.. ; multiprocessing requires us to deal well with IPC. ( Inter Process Communication) which can be discussed in another post.

So we can easily conclude that threads should not be used for CPU bound tasks. This discussion however does not mean threads are not useful. Where are threads then useful in Python?

  • In GUI applications to keep the UI thread responsive
  • IO tasks (network IO or filesystem IO)

Follow Practo on twitter for regular updates from our Engineering team. If you liked this article, please hit the ❤ button to recommend it. This will help other Medium users find it.