Python multithreading : A myth?

Yash Suresh Chandra
4 min readApr 14, 2019

--

Multithreading is an easy and convenient way for developers to achieve concurrency. Multithreading is considered better than multiprocessing because:

  1. Threads are light weight: creating a thread is much less time and resource consuming then spawning a new process.
  2. Resource sharing: threads from same process can share data, resources between them. No message passing or inter-process communication needed.

. On a multi-core processor, running threads in parallel can help in :

  1. Faster execution: tasks can be divided into sub-tasks, each of them can be completed on different core simultaneously.
  2. Responsiveness: applications can be responsive by assigning worker threads to do long running tasks so that main thread is not busy for long.
  3. Avoid cache miss: retrieving value by one thread and caching it can help other threads to get same value from cache.
  4. Many more…

IN PYTHON

Most languages provide multithreading support by inbuilt libraries. Python is one of them. Due to its less verbosity, easy initial learning curve, etc. Python can be used instead of other languages for faster development. Developers can exploit lots of features that the language provides. One of them is multithreading module.

Using multithreading in Python is very easy.

import threading

…and you have used it. Not that difficult.

All the benefits that we discussed earlier at your service by just one line of code. Faster execution, responsiveness, utilization, etc.

EXPERIMENTS

Simple experiments in Python can show you some amusing results. Below code piece was run on a quad-core Intel Core i7 2.5 GHz -

And results were -

time taken to count till 100000000 with 1 threads: 4.129178
time taken to count till 100000000 with 2 threads: 4.141240
time taken to count till 100000000 with 3 threads: 4.142110
time taken to count till 100000000 with 4 threads: 4.166194

Very amusing. Time taken is almost constant, in fact linearly increased as we increased number of threads.

SOLVING MYSTERY

Some Python implementations like CPython use GIL (Global Interpreter Lock) which is, as the name suggests, a lock (instance of a binary semaphore lock) on the Python interpreter that ensures at any point of time, only one thread can use interpreter. This becomes the bottleneck when you try to implement threading in Python. Multiple eligible threads will be competing to get this lock so that their work can be done. So essentially, even if you have created many threads, only one will be doing it’s work at a time.

That’s why in our example above, time is almost same in all cases. Only one thread was executing at any point of time. So it was as good as all work done by a single thread.

Even if GIL limits Python’s ability for multithreading, it comes with other advantages-

  1. Python uses reference counting method to assist garbage collector. To avoid any bugs/inconsistencies because of threads changing reference count of variables simultaneously, GIL is helpful.
  2. C libraries that are not thread safe can be easily integrated.
  3. Single threaded tasks have to deal with GIL (1 lock) only, this makes them faster than otherwise dealing with multiple locks.

SO WHY MULTITHREADING ?

Any thread that needs to be executed must acquire GIL first. This may lead to starvation for other threads, that’s why in regular intervals (not necessarily regular time intervals) Python virtual machine checks or takes back lock from the thread so other threads can use it and starvation can be avoided. Regular intervals can be 100 ticks (instructions). Also GIL is released when a thread has to wait for I/O (read from file, socket, etc).

Till GIL is acquired by another thread, all other threads waiting for it are enqueued. When a thread releases GIL, it signals it to the operating system and OS chooses next thread from the queue (a priority queue) that can acquire it.

DOES THIS HAPPEN ALWAYS ?

GIL is not present in all Python implementations. CPython, Pypy has GIL. Other implementations like Jython and IronPython does not contain GIL and so that can use threads as expected.

FINAL THOUGHTS

CPython provides multithreading module for developers but it acts like a single threaded system. It allows us to create threads so that multiple works can be done in background. Only catch is that one thread will execute at a time. This has both advantages and disadvantages described above. For doing things in parallel, we can always use multiprocessing in python. multiprocessing module provides concurrency that multithreading was supposed to but cannot.

REFERENCES

  1. https://en.wikipedia.org/wiki/Thread_(computing)
  2. http://www.dabeaz.com/python/UnderstandingGIL.pdf
  3. https://realpython.com/python-gil/
  4. https://wiki.python.org/moin/GlobalInterpreterLock
  5. https://docs.python.org/3/library/gc.html

--

--