Python Memory and Multiprocessing

Here’s how you can speed up CPU-bound programs

Jun Wu
Jun Wu
Dec 9, 2019 · 3 min read
Image for post
Image for post

Python is powerful. Python is used widely in the data science and machine learning community. Python is versatile. It’s widely used by large technology companies such as DropBox, Google, Instagram, and Spotify. Python enables developers to be productive quickly.

Need I say more? Python is a great programming language.

However, experienced Python developers will tell you to watch that memory usage. You’ve got to program mindfully to get the most out of Python. In data science and machine learning, there’s a need for speeding up CPU-bound programs. It’s often done by leveraging the art of multiprocessing.

Use the Latest and Greatest

Before we dive in, are you using Python 3 instead of Python 2? Do you know the newest tricks in Python 3? Starting with Python 3.7, there are significant performance improvements.

Here’s the list of optimizations from RealPython:

  • Less overhead in calling many methods in the standard library.
  • Method calls are up to 20% faster.
  • The startup time of Python itself is reduced by 10–30%.
  • Importing typing is seven times faster.

Understanding Python Memory Management

First, it’s good to get a solid understanding of how Python management works. You can start by reading this blog post. Then, to get an idea of how the Global Interpreter Lock works, and CPython’s Memory Management, read this blog post on RealPython.

The Art of Multiprocessing

Now you have a basic understanding of how Python memory management works, you know that you have to sidestep the Global Interpreter Lock. In Python, instead of using threads, you can use multiprocessing.

Multiprocessing is a part of concurrent programming. You can read about the big picture and the difference between speeding up I/O-bound programs and CPU-bound programs here.

In machine learning and data science, your code is often CPU bound, meaning that the program’s bottleneck is at the CPU level. This means that multiprocessing is often the better method to use for performance enhancement than other methods. By leveraging the power of multiple CPUs and multiple cores on a single machine, you can improve the performance of your programs.

Multiprocessing in Python will allow you to:

  • run processes in separate memory spaces.
  • take advantage of multiple CPUs and cores.
  • communication model.
  • give each process its own Python Interpreter and its own GIL.

The real advantage of multiprocessing is that you don’t get a lot of the usual multithreading problems such as data corruption and deadlocks. Although you will have a larger memory footprint than with a multithreading model, it’s a good tradeoff.

It is possible to share data between processes. You can read about this here.

Getting started with multiprocessing is easy, here are a few resources:

The one thing you have to remember with multiprocessing is that the entire memory is copied into a subprocess. That’s why memory profiling is important.

Memory Profiling

If you haven’t used the Python memory-profile package, now’s the time to start using it.

Here are a few resources that will help you understand memory leaks (particularly in reference to data science/machine learning work):

When Multiprocessing is Not Enough

Experienced developers who use Python multiprocessing library often use it for different types of machine learning and data science projects.

However, bench-marking actual examples to see whether they can handle numerical data and stateful computing is a good exercise.

If you haven’t used charm4py, you should try it while reading Juan Galvez’s arguments in the above article to get a complete picture of this debate.

With Python 3’s performance enhancements, Python is faster than ever. Understanding Python memory management, and taking full advantage of multiprocessing, will allow you to speed up your Python CPU bound programs using multiple CPUs or multiple cores. At the same time, you can keep track of memory usage with memory profiling.

With careful consideration, your Python programs will run faster than ever.

Better Programming

Advice for programmers.

By Better Programming

A weekly newsletter sent every Friday with the best articles we published that week. Code tutorials, advice, career opportunities, and more! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Jun Wu

Written by

Jun Wu

Writer, Technologist: Tech|Future|Leadership (Forbes-AI, Behind the Code)

Better Programming

Advice for programmers.

Jun Wu

Written by

Jun Wu

Writer, Technologist: Tech|Future|Leadership (Forbes-AI, Behind the Code)

Better Programming

Advice for programmers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store