Grok the GIL, What’s This and Why Do We Need This?
Motivation
p.s I quote lots of statements in this medium post, refer to references for details plz!
TOC
1. Motivation
2. TOC
3. Concepts
* GIL
* What Problem Did the GIL Solve for Python?
* Is there other locks to implement?
* Why Was the GIL chosen as the solution?
* Conclusions
* Key Takeaways for GIL
* Cooperating Multitasking vs Preemptive Multitasking
* Multi-Processing vs Multi-Threading vs Async
4. Demo
* Benchmarking with Multi-Threading and Async
5. Key Takeaways
Concepts
GIL — What Problem Did the GIL Solve for Python?
Python uses reference counting for memory management. It means that objects created in Python have a reference count variable that keeps track of the number of references that point to the object. When this count reaches zero, the memory occupied by the object is released.
Let’s take a look at a brief code example to demonstrate how reference counting works:
>>> import sys>>> a = []>>> b = a>>> sys.getrefcount(a)
3
In the above example, the reference count for the empty list object []
was 3. The list object was referenced by a, b
, and the argument passed to sys.getrefcount(a)
.
Is there Other Locks to implement?
multiple locks
* pros: performant in multi-threading
* cons:
* Deadlocks (deadlocks can only happen if there is more than one lock)
* decreased performance in single-threaded program: caused by the repeated acquisition and release of locks.
Why Was the GIL Chosen as the Solution?
A lot of extensions were being written for the existing C libraries whose features were needed in Python. To prevent inconsistent changes, these C extensions required a thread-safe memory management which the GIL provided.
The GIL is simple to implement and was easily added to Python. It provides a performance increase to single-threaded programs as only one lock needs to be managed.
C libraries that were not thread-safe became easier to integrate. And these C extensions became one of the reasons why Python was readily adopted by different communities.
As you can see, the GIL was a pragmatic solution to a difficult problem that the CPython developers faced early on in Python’s life.
But a program whose threads are entirely CPU-bound, e.g., a program that processes an image in parts using threads, would not only become single threaded due to the lock but will also see an increase in execution time, as seen in the above example, in comparison to a scenario where it was written to be entirely single-threaded.
This increase is the result of acquire and release overheads added by the lock.
Conclusions
The creator and BDFL of Python, Guido van Rossum, gave an answer to the community in September 2007 in his article It isn’t Easy to remove the GIL
I’d welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) does not decrease
Key Takeaways for GIL
- It makes non-thread-safe C extensions and libraries easier to integrate into the Python ecosystem.
- In multithreaded programs, the GIL makes the garbage collector cohesive with the reference counting mechanism.
- Single-threaded programs are very performant.
Cooperative Multitasking v.s Preemptive Multitasking
- cooperative: control relinquished to other task voluntarily, control by application(developer)
- preemptive: control relinquished to other task involuntarily, control by the OS. some sort of scheduler involved
Multi-Processing vs Multi-Threading vs Async
- multiprocessing: task scheduling is done by the operating system
- multithreading: the Python interpreter does the scheduling.
- asynchronous: scheduling is done by what’s called the event loop. Developers can specify in their code when a task voluntarily gives up the CPU so that the event loop can schedule another task. For this reason, this is also called cooperative multitasking.
Demo
Benchmarking with Multi-Threading and Async
Key Takeaways
- GIL
* Pros of using GIL
* Performant in single-threaded program, since using multiple locks would be decreased performance caused by the repeated acquisition and release of locks.
* CPython memory management would be thread safe
* Cons of using GIL
* make multi-threaded program single thread
2. Async v.s Multi-threading: context-switching and preemptive multitasking adds overhead on multi-threading
References
- https://www.gushiciku.cn/pl/pIpG/zh-tw
- https://www.gushiciku.cn/pl/pIpG/zh-tw
- https://super9.space/archives/tag/python
- https://blog.louie.lu/2017/05/19/深入-gil-如何寫出快速且-thread-safe-的-python-grok-the-gil-how-to-write-fast-and-thread-safe-python/
- https://realpython.com/python-gil/](https://realpython.com/python-gil/