Visualizing Threading vs Multiprocessing in Python

Image by Hitesh Choudhary

Threading and Multiprocessing are topics surprisingly challenging to clarify in relation to python and at the same time essential for performance on heavier projects. Doing it right Python can become much faster.

After starting a project that involved spawning chromium drivers with selenium in Python, I encountered problems with processing power. The drivers took a lot of CPU usage and I needed to figure out how I could optimize them in Python. The first idea was obviously to look into threading and multiprocessing possibilities.

I advise having some knowledge about some of the following keywords to fully understand these topics. These are also advised to get a deeper understanding of this field.

  • Concurrency vs Parallelism
  • Mutex/Semaphore
  • Multiprocessing
  • Race Condition
  • Scheduling

Let’s cut to the chase!

Python program execution

A python program is designed to only run on one processor or CPU at a time, where a single thread has control of the program. The programming language is designed this way to solve mainly memory management issues. For programs running on multiple CPUs, problems related to shared memory and race condition becomes present. Such languages are C, C++ and Go which utilize all CPUs available. By simply running the program on one CPU they avoid these and other problems, ensuring that it is easy to use.

However, this design quickly becomes challenging for more advanced projects. This is where the python Global Interpreter Lock(GIL) comes in. To control the program flow, they use this Lock to limit access to the CPUs. For more experienced programmers it is something called a mutex. This GIL allows one thread, running on one CPU, to control the python program at one time. The GILs functionality is depicted more clearer in the image below. Now it is clear that one python program only runs on a single core as default.

GIL’s main role.

Luckily, there are ways to get around this GIL design. Libraries exist to implement threading and multiprocessing in python.

Threading

Threading in python can be implemented using the threading lib. However, threading is not the same as using several processes. Because of the GIL, all threads that are created in python will be running on one CPU. Schedulers are used to optimize the thread execution on a single processor.

Python runs multi threads and the GIL runs them on one CPU.

However, there are exceptions.

  1. If the operations are input/output based they are executed on all processors. For example, selenium drivers use HTTP protocol which is input/output operations. So when running drivers in threads they are distributed to all CPUs.
  2. Libraries built on C and C++ will run on multiple CPUs. For example, NumPy Is a library that is built on C so when running NumPy operations they use all available processors.

These do not induce memory management issues and are therefore distributed to optimize performance. This is depicted in the figure below.

Exceptions by the GIL.

Multiprocessing

If the operations are not part of the exceptions mentioned above and you still need to run the operations in parallel, then the multiprocessing or concurrent.futures libraries are the way to go.

These will run all operations on all CPUs, but for this, you might get issues with the memory management mentioned above. The multiprocessing libs go around the GIL completely to make this happen. Visualized in the figure below.

Bypassing the GIL.

Key getaways/Conclusion

  • The multiprocessing lib threads are very different from the threading lib threads. Multiprocessing lib threads spawn completely new python interpreters to run the operations on another core. While the threading lib spawn threads within the main thread running on one core.
  • The GIL run Input/Output operations and C libraries on multiple cores.
  • The GIL can be bypassed completely with the multiprocessing lib.

Thank you for reading! Hope it was straight to the point and clarified some misconceptions.

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Trym Andreassen

Trym Andreassen

Cybernetics and Robotics Student, published researcher and active learner.