Multiprocessing for Data Scientists in Python

Why pay for a powerful CPU if you can’t use all of it?

Sebastian Theiler
Analytics Vidhya

--

An Intel i9–9900K with 8 cores ranges from $450 to $500

That’s a lot of money to be spending on a CPU.
And if you can’t utilize it to its fullest extent, why even have it?

Multiprocessing lets us use our CPUs to their fullest extent. Instead of running programs line-by-line, we can run multiple segments of code at once, or the same segments of code multiple times in parallel. And when we do this, we can split it among multiple cores in our CPU, meaning that can compute calculations much faster.

And luckily for us, Python has a built-in multiprocessing library.

The main feature of the library is the Process class. When we instantiate Process , we pass it two arguments. target, the function we want it to compute, and args, the arguments we want to pass to that target function.

import multiprocessing
process = multiprocessing.Process(target=func, args=(x, y, z))

After we instantiate the class, we can start it with the .start() method.

process.start()

On Unix-based operating systems, i.e., Linux, macOS, etc., when a process finishes but has not been joined, it becomes a zombie process. We can resolve this with process.join().

--

--

Sebastian Theiler
Analytics Vidhya

This account is inactive now; thank you to everyone who has read my pieces! I’m so glad I could share some knowledge about AI and data science. I might return…