Escape Python’s GIL with Numpy

Parallel Workloads in Data Science (Part 1)

Tasos Pardalis
Road to Full Stack Data Science

--

Numpy minus Python GIL equals speed

This is the first post in a series of posts I’m writing on how to work with Parallel Workloads in Python. During my research for parallel workloads with Python, I came to the realisation that I needed to understand how parallel workloads work and why it isn’t as straight forward to implement in Python. My understanding is that one needs to understand Python’s GIL and how to overcome it as a fundamental part of the process of writing parallel workloads code in Python. When I am done publishing all parts, I’ll be publishing a larger post connecting them into one. The smaller parts make it easier to digest the content .

Python Independent Parallel Processes

In Python, you can use the multiprocessing module to run independent parallel processes by using subprocesses instead of threads, hence, allowing the programmer to fully utilise multiple processors on a given machine. It runs on both Unix and Windows.

However, using the standard CPython implementation, means you cannot fully use the underlying hardware because of the global interpreter lock (GIL) that prevents running the bytecode from multiple threads simultaneously.

Pause! If you are a self-taught Python developer like myself, by now you are…

--

--

Tasos Pardalis
Road to Full Stack Data Science

I am a full-stack Data Scientist with a passion for innovation. I like gaining business value from data and automating analysis.