Concurrency in Python
What is concurrency?
Concurrency is working on multiple things at the same time. In Python, this can be done in several ways:
threading, by letting multiple threads take turns.
- By firing off a task and continuing to do other stuff, instead of waiting for an answer from the network or disk. This is how asynchronous IO operates with the
multiprocessingwe’re using multiple processes. This way we can truly do more than one thing at a time using multiple processor cores. This is called parallelism.
Concurrency is working on multiple things at the same time
Make sure to check out our latest story as well:
Is concurrency really what you want?
Before you consider concurrency, which can be quite tricky, always take a good look at your code and algorithms first. Many speed and performance issues can be resolved by implementing a better algorithm or adding caching. Entire books are written about this subject, but some general guidelines to follow are:
- Measure, don’t guess. Measure which parts of your code take the most time to run. Focus on those parts first.
- Implement caching. This can be a big optimization if you perform many repeated lookups from disk, the network, and databases.
- Reuse objects instead of creating a new one on each iteration. Python has to clean up every object you created to free memory. This is called garbage collection. The garbage collection of many unused objects can slow down your software considerably.
- Reduce the number of iterations in your code if possible, and reduce the number of operations inside iterations.
- Avoid (deep) recursion. It requires a lot of memory and housekeeping for the Python interpreter. Use things like generators and iteration instead.
- Reduce memory usage. In general, try to reduce the usage of memory. For example: parse a huge file line by line instead of loading it in memory first.
- Don’t do it. Do you really need to perform that operation? Can it be done later? Or can it be done once, and can the result of it be stored instead of calculated over and over again?
- Using PyPy or Cython. You can also consider an alternative Python implementation. There are speedy Python variants out there. See below for more info on this.
Do you still consider concurrency for your Python software? Great; keep reading!
The difference between threads and processes
A thread is an independent sequence of execution, but it shares memory with all the other threads belonging to your program. A Python program has, by default, one main thread. You can create more of them and let Python switch between them. This switching happens so fast that it appears like they are running side by side at the same time.
A process is also an independent sequence of execution. Unlike threads, a process has its own memory space that is not shared with other processes. A process can clone itself, creating two or more instances.
The following image illustrates this:
Asynchronous IO is not threading, nor is it multiprocessing. In fact, it is a single-threaded, single-process paradigm. I will not go into asynchronous IO in this article.
The GIL: a blessing, a curse or both?
Python has one peculiarity that makes concurrent programming harder. It’s called the GIL, short for Global Interpreter Lock. The GIL makes sure there is, at any time, only one thread running. Because only one thread can run at a time, it’s impossible to make use of multiple processors with threads. But don’t worry, there’s a way around this.
The GIL makes sure there is, at any time, only one thread running.
The GIL was invented because CPython’s memory management is not thread-safe. With only one thread running at a time, CPython can rest assured there will never be race conditions.
Race conditions? Thread-safety?
These terms might be new to you, so let’s define them:
- A race condition occurs when multiple threads can access and change shared data at the same time.
- Thread-safe code only manipulates shared data in such a way, that it does not interfere with other threads.
As mentioned already, threads share the same memory. With multiple threads running at the same time, we don’t know the order in which the threads access shared data. Therefore, the result of accessing shared data is dependent on the scheduling algorithm. This algorithm decides which thread runs when. Threads are “racing” to access/change the data.
As an example, let’s create a shared variable
a, with a value of 2:
a = 2
Now suppose we have two threads,
thread_two. They perform the following operations:
a = a + 2
a = a * 3
thread_one is able to access
a first and
thread_two second, the result will be:
a = 2 + 2, a is now 4.
a = 4 * 3, a is now 12.
However, if it so happens that
thread_two runs first, and then
thread_one, we get a different output:
a = 2 * 3, a = now 6
a = 6 + 2, a is now 8
So the order of execution obviously matters for the output. There’s an even worse possible outcome though. What if both threads read
a at the same time, do their thing, and then assign the new value? They will both see that
a = 2. Depending on who writes its result first,
a will eventually be 4 or 6. Not what we expected! This is what we call a race condition.
Race conditions are difficult to spot, especially for software engineers that are unfamiliar with these issues. Also, they tend to occur randomly, causing erratic and unpredictable behavior. These bugs are notoriously difficult to find and debug. It’s exactly why Python has a GIL — to make life easier for the majority of Python users.
But if the GIL holds us back in terms of concurrency, shouldn’t we get rid of it or be able to turn it off? It’s not that easy. Other features, libraries, and packages have come to rely on the GIL, so something must replace it or else the entire ecosystem will break. This turns out to be a difficult problem to solve. If it interests you, you can read more about this on the Python wiki.
It’s exactly why Python has a GIL — to make life easier for the majority of Python users.
Multiprocessing with Python
After this fairly long, but necessary explanation, we are ready for some example code and experiments. Let’s get to work!
Our test function
We first define a function that we can use to benchmark our different options. All the following examples use the same function, called
def heavy(n, myid):
for x in range(1, n):
for y in range(1, n):
print(myid, "is done")
heavy function is a nested loop that does multiplication. It is a CPU-bound function. If you observe your system while running this, you’ll see CPU usage close to 100%. You can replace it with anything you want, but beware of race conditions — don’t use shared objects or variables.
We’ll be running this function in different ways and explore the differences between a regular, single thread Python program, multithreading, and multiprocessing.
Option 1: The baseline
Each Python program has at least one thread: the main thread. Below you’ll find the single-threaded version, which serves as our baseline in terms of speed. It runs our
heavy function 80 times, sequentially:
Option 2: using threads
In the following example, we use multiple threads to run heavy, again 80 times. Each invocation gets its own thread:
If Python wouldn’t have the GIL, this would be much faster. But despite having 80 threads, this runs roughly as fast as the baseline. The baseline is, in fact, even a little faster because it does not have all the overhead of thread creation and switching between threads.
This is the GIL at work. Each thread takes turns, instead of running all at once. If heavy would have been an I/O bound function, however, this would have given us a tremendous speed increase. Let’s test this! We can simulate I/O bound by using
time.sleep(). I modified the heavy function in the next code fragment:
Even though we have 80 threads all sleeping for two seconds, this code still finishes in a little over two seconds. While sleeping, Python will schedule other threads to run. Sweet!
Option 3: using multiprocessing
Now we’ll try true parallelism, with the
As you can see, this looks almost the same as the threaded version, code-wise. The
multiprocessing libraries are intentionally made very equivalent. But the 80 invocations of
heavy finish roughly twice as fast this time!
My test system (a small desktop computer) has only two CPU cores, so that explains why it’s a factor two. If I run this code on my brand new laptop, with 4 faster CPU cores, it’s more than four times faster. This perfectly demonstrates the linear speed increase multiprocessing offers us in case of CPU-bound code.
Option 4: using multiprocessing with a pool
We can make the multiprocessing version a little more elegant by using
multiprocessing.Pool(p). This helper creates a pool of size
p processes. If you don’t supply a value for
p, it will default to the number of CPU cores in your system, which is a sensible choice.
By using the
Pool.map() method, we can submit work to the pool. This work comes in the form of a simple function call:
The runtime for this version is roughly the same as the non-pooled version, but it has to create fewer processes so it is more efficient. Instead of creating 80 processes, we create four and reuse those each time.
- Use the
asyncio) for I/O bound software: it’s lightweight and improves performance considerably
- Use the
multiprocessinglibrary for CPU bound problems. It leverages the full potential of all CPUs in a system.
- Consider other options as well: perhaps optimizing your algorithms or using another Python implementation is enough!