Unlocking Pythons Multi-Core Potential: How Python 3.12’s GIL Update Transforms Data Processing

Published in

Devjam

7 min readAug 19, 2024

Introduction

Python 3.12 introduced significant changes to the Global Interpreter Lock (GIL), a long-standing feature in CPython, Python’s C-based reference implementation. These modifications represent one of the most important updates in this version and one of the bigger changes in recent history, potentially revolutionizing how Python handles concurrency and parallelism. One field that strongly relies on parallelism is data processing, hence it is interesting to see how this field will be impacted.

In this post, we will dig into what the GIL is, and how it currently impacts parallelism and data processing in Python. We will also discuss the proposed changes to the GIL from the relevant PEP, and its challenges and possibilities, and compare this change with using threading and multi-processing. Knowledge of Python and parallel programming are encouraged but not required.

Historical Context of the GIL

“But I also don’t expect it (the GIL) to go away until someone other than me goes through the effort of removing it, and showing that its removal doesn’t slow down single-threaded Python code. ” — Guido van Rossum (2007)

It has finally happened 17 years later. And Guido didn’t even have to do everything.

The Global Interpreter Lock has been a fundamental part of CPython since its inception. It’s a mutex (or a lock). Mutex stands for “Mutually exclusive”, and is an prevention mechanism that ensures a python object can only be accessed by one thread at a time. The GIL has served several purposes:

Ensuring thread-safety in multi-threaded programs
Simplifying memory management
Facilitating easy integration with C libraries by providing an intermediate layer with Python.

However, the GIL has also been a significant bottleneck for multi-threaded applications, especially on multi-core systems. It has limited Python’s ability to utilize modern hardware, particularly in CPU-bound tasks fully.

Key Changes in Python 3.12’s GIL

For all details, see PEP 684.

Python 3.12 introduced a major change to the GIL, with the technical name “A Per-Interpreter GIL.” The key modifications include:

Per-Interpreter GIL: Each sub-interpreter now has its own GIL, allowing for true parallelism across different interpreters.
Improved GIL Implementation: The core GIL algorithm has been optimized for better performance, even within a single interpreter.
Sub-interpreter API: The CPython API has been updated to manage sub-interpreters, each with its own isolated state and GIL. A Python API will arrive in Python 3.13.

Impact on Performance and Concurrency

These changes have several significant implications for parallel processing and concurrency going forward. While the sub-interpreters solution is not newly introduced here, the fact that each now has a GIL is a major overhaul that will change

Improved Parallelism: Applications can now achieve better performance on multi-core systems by utilizing sub-interpreters, each running with its own GIL.
Better Scalability: CPU-bound tasks can be distributed across multiple interpreters, potentially scaling linearly with the number of available cores.
Fine-grained Concurrency: Developers can now design their applications to native concurrency, instead of fiddling around with suboptimal older methods, such as threading or multiprocessing.

Limitations for multi-threaded Python applications

While these changes are significant, it is important to be aware of the limitations.

Existing Code: Python programs using threads within a single interpreter will not automatically benefit from these changes. They will still be bound by the interpreter’s GIL.
Explicit Usage Required: Developers need to explicitly use sub-interpreters to take advantage of the new parallelism capabilities. This might require redesigning of programs.
Data Sharing Challenges: Sub-interpreters operate in isolated environments, which can make sharing data between them challenging. New patterns and possibly new libraries will need to be developed to facilitate efficient inter-interpreter communication.
C Extension Compatibility: Not all C extensions may be immediately compatible with sub-interpreters, potentially limiting their use in some scenarios.

Impact on data processing

The data ecosystem in Python is extensive, with many parts of the machine learning stack being delegated to lower-level languages such as C(Numpy, Pandas, Scipy), scala(Spark, Kafka), and Java(hive, Trino). Almost all of these frameworks use concurrency and parallelization to increase processing speed, some within the same machine, some distributed. So where can we expect speed increases? This will be mainly in the Python + C stack. Part of the reason is that CPython provides good interoperability with C while still suffering from the GIL, and the other part is due to most C-based frameworks running on a single machine, where the GIL will typically play a bigger role than in distributed frameworks.

Looking at the data stack, you can expect speed improvements with tasks such as data preprocessing, feature engineering, model training, and hyperparameter optimization, following changes in scikit-learn, Pandas, Numpy, and more. We can even consider matplotlib generating plot components in parallel or two plots simultaneously.

Additionally, when you are processing large amounts of data using multi-processing, this will incur a large memory overhead since all objects need to be copied — a common use-case with Numpy arrays or Pandas dataframes. Using sub-interpreters will be a viable alternative if it omits this problem. However, we must consider that parallelizing a process is much easier than utilizing the sub-interpreters, meaning this will typically have to be integrated into the library and cannot be done ad-hoc by the data scientists or engineers.

What This Means for Data Scientists and Analysts

If we consider the changes outlined in the previous section, it becomes clear that the impact for data scientist and analysts will be limited in scope.

The optimizations that can be achieved using the sub-interpreters will be integrated into the libraries and while they might be enabled using a parameter, in general data scientists and analysts will not enable this type of parallelism themselves. While parallel processing is currently not a rarity, it’s also not too common. Big parts of Python and Numpy are still single-threaded, with some parts optionally using different types of parallel processing through the function parameters. Depending on how functions are being called, developers typically use single-threaded processing as the default and fallback to make sure their programs are compatible with different python versions and the Scipy-stack.

Threading, multi-processing and sub-interpreters

Taking a step back, there are currently two methods of speeding up your processes in Python: threading and multiprocessing. In short, threading allows part of a process to wait on each other, using the same memory space and still respecting the GIL. Multiprocessing uses a separate memory space for each parallel process and has a GIL for each separate python instance, sidestepping its limitations. It’s important to understand how the sub-interpreter with per-interpreter GIL stacks up against the current options.

In the table below, you can get a feeling of how these compare in usage and their interaction with the GIL.

Seeing how the sub-interpreters compare with threading and multi-processing we can see there is a place in the ecosystem fitting for sub-interpreters. With lower overhead than multiprocessing and no GIL lock, it finds a great niche between threading and multi-processing.

Getting Started with the new GIL

As mentioned, the python 3.12 update only allows interacting with the per-interpreter GIL using the CPython API, hence working with them is challenging. There is a proposal for Python 3.13 to include a Python API, which we can take a look at. For more info and examples, see the specification of PEP-554

To create a new interpreter that runs some code:

interp = interpreters.create()
def run():
    interp.exec('print("during")')
t = threading.Thread(target=run)
print('before')
t.start()
t.join()
print('after')

It is also possible to set values in the __main__ namespace of the sub-interpreter, like such

interp = interpreters.create()
interp.set_main_attrs(a=1, b=2)
interp.exec(tw.dedent("""
    res = do_something(a, b)
    """))
res = interp.get_main_attr('res')

With the .get_main_attr method, we can retrieve values from the sub-interpreter, meaning we can also get information from the subinterpreter while running. This opens up many possibilities for interaction, and you can already imagine passing data to a sub-interpreter that will be processed and returned.

If you want to know more about python 3.12 or the per-interpreter lock, make sure to check out these articles:

We would love to hear how you intend to use these new features! Please share your usage and ideas below.

Conclusion

In conclusion, this change is significant and thrilling, possibly being one of the most impactful changes to Python in recent years. With the possibility of speeding up countless operations in Python, without relying on the blunt-force tool of multi-processing, we could see many more tailored and specific improvements to the ecosystem.

We will all have to see how much this will be adopted. Implementation will be complex and challenging in a well-established ecosystem with mature libraries. However, we cannot deny the benefits; if the per-intepreter GIL can deliver on its promises, it cannot be ignored.

If you want to look ahead beyond the changes to the GIL, make sure to look at the Python 3.13 Summary, which will also include the Python API for sub-intepreters.

If you found this interesting, please leave a clapp and follow for more updates related to python and data processing!