In what cases is the use of libraries for concurrent application development appropriate and can result in increased performance using Python?
To answer this question, we first need to discuss how the Python interpreter works. In the course of the story, when detailing how the interpreter works, we will be referring to CPython, which brings the reference implementation of the Python language and is also the standard and most used interpreter among developers.
CPython is the reference interpreter, created by Guido van Rossum, creator of the Python language. With the popularization of the language, other interpreters were created by the community, such as IronPython (.NET), Jython (JVM), PyPy, Stackless, etc.
When executing the code, the interpreter uses a security mechanism called GIL, or Global Interpreter Lock. This lock structure was originally implemented to eliminate problems related to memory management in competing application scenarios. To execute the interpreter's instructions, it is necessary to obtain control of the GIL by the thread.
Locks are one synchronization technique. A lock is an abstraction that allows at most one thread to own it at a time. Holding a lock is how one thread tells other threads: “I’m changing this thing, don’t touch it right now.” — MIT Software Construction
This means that only one thread will be running per process during the execution of a Python application.
Global Interpreter Lock
GIL is recognized by most of the Python community as the main factor responsible for popularizing the language. Due to GIL, classic problems of the concurrent programming area are avoided, and interoperability with C libraries is performed in a simple way, which was determinant for the adoption of Python by developers of other languages.
Understanding how Global Interpreter Lock works is essential to develop efficient Python applications.
IO-bound and CPU-bound applications
When getting control of Global Interpreter Lock, a thread will continue with its control while processing instructions. When finished, the thread will release the control obtained. Other threads will perform the same procedure of obtaining and releasing the GIL during the application's execution.
An exception occurs in applications that make calls to an external process, such as IO processes. In this case, the running thread releases the GIL control when making the system call to perform IO and obtains it again only after completing the IO operation.
This type of application is called IO-bound because its performance is directly related to IO operations' performance.
IO-bound applications do not suffer any performance penalty due to GIL's use by the interpreter.
As an example of IO-bound, we will use an application that obtains great Allan Holdsworth albums’ information using the Cover Art Archive API. The following code snippet defines the functions that we will use to benchmark between the forms of sequential and parallel execution through threads.
The following code performs the sequential execution:
The parallel execution, using five threads, is done as follows:
When comparing each execution's total duration, it was possible to notice a performance improvement of up to 5 times using the threads approach. Being an IO-bound process, the use of threads provides significant performance gains.
On the other hand, we have CPU-bound applications. These applications have their performance directly related to the speed that instruction processing is performed. Besides having their performance severely penalized by the Python interpreter's instruction execution speed, they cannot be optimized through a parallelization strategy via threads due to GIL.
As an example of a CPU-bound application, we will use a function that sums the numbers from 0 to n ten times. The following code passage defines this procedure:
The sequential execution is performed as follows:
The parallel execution, also using five threads, is performed as follows:
When making the final comparison, it was not possible to notice any improvement between the implementations. Because it is a CPU-bound application, threads' use does not result in performance improvement due to Global Interpreter Lock.
In this case, I recommend that other options be considered, from the use of multiple processes with the multiprocessing module, using another Python interpreter more suitable, or, if possible, the use of another language more appropriate for CPU-bound tasks maximize the performance of your application.
The process of categorizing the application between CPU-bound and IO-bound categories is determinant for defining any application architecture.
With Python, we must understand how the code interpreter works to ensure that our code meets performance requirements.
I hope you enjoyed reading this post. Thank you for your time. And if you have more to spare, listen to some Holdsworth music :)
Take care, and keep coding!