I consider myself to be an efficient person. I always try to utilize my time wisely. When I have some time to kill, like waiting in line to run some errands, I always bring my laptop and get some work done. I exercise whilst listening to podcasts or audiobooks and other nerdy stuff. This week I just started training in a new gym where you can solve sudokus whilst using the treadmill, and I figured out that I can’t do both at the same time! It was then that I realized that Python and I have a huge thing in common, We are both very bad with multitasking.
To understand why Python multitasking works differently than other languages, we will have to understand the difference between sequence execution parallelism and concurrency.
Sequential means running one task at a time. Let’s say I invited a few friends over for dinner, and I want to bake some cakes for them. I get the recipes for my 3 favorite cakes: chocolate, cheese, and caramel. Since my baking skills are not at their best, I can only make one cake at a time. So first I make the chocolate cake, when it finishes baking I start making the cheesecake and the same with the caramel cake. Let’s say every cake takes about 10 minutes to mix the ingredients and 50 minutes to bake, in total I spent 3 hours making those cakes, which is a lot of time.
Concurrency means making progress in multiple tasks at the same time but not necessarily simultaneously. So let’s say my baking skills are kind of better, now I can start mixing the chocolate cake, and when I put the cake in the oven, I can start mixing the cheesecake and so on and so forth. So I am using the baking time which is idle for me, to make progress with the other cakes, but I don’t do it simultaneously. This will take about 1 hour and 20 mins for the 3 cakes to be ready, not too bad.
Parallelism means running multiple operations at the same time, or as we call it in our day to day life — multitasking. So Let’s say I am calling 2 of my friends to help me with the baking (if they want to eat that, they should help too!). So now we can simultaneously mix the 3 cakes and bake them all at the same time, which will take only 1 hour.
So how are python threads related to the bakery I just started in my kitchen?
Well, in most programming languages we run threads in parallel. In Python we have something called GIL which stands for Global Interpreter Lock, is a lock that allows only one thread to hold the control of the Python interpreter. This means that only one thread can be in a state of execution at any point in time. Python wasn’t designed considering that personal computers might have more than one core, which means it is not designed to use multiple processes or multiple threads. Therefore the GIL is necessary to enforce lock when accessing a Python object in order to be on the safe side. It is definitely not the best, but it’s a pretty effective mechanism for memory management.
So does that mean threads in Python are useless?
Absolutely not! Even though we can not execute threads in parallel, we can still run them concurrently. This will be good for tasks, like in the baking example, that have some waiting time. I/O bound problems cause the program to slow down because it frequently must wait for input/output from some external resource. They arise frequently when the program is working with things that are much slower than the CPU, like a DB or a network.
Let’s look at the following code for baking some cakes
Now let’s add the ability to run those in a sequence and using multithreading and add a decorator to measure the time of each run.
Now we can start baking
Running this code will yield the following results:
In this example, we can see that it’s much faster to bake the cakes using the multithreading approach because it maximizes the use of resources. It interrupts running one operation while continuing working on others. This fact does improve the performance of the program.
Despite this efficiency, we have seen in this example, using multithreading has some downsides. The operating system actually knows about each thread and can interrupt it at any given time and start running a different thread. This can cause some race conditions which is something we have to keep in mind while using this approach. The other thing is that our number of threads is limited by the operating system. In this example, we have only 1 task, but in real-life examples, we can have a lot of them. So in this technique, the performance is capped by the number of threads available in our core.
So how can we do it better?
In Python 3.4 we were introduced to a package called Asyncio In fact, Asyncio is a single-threaded, single-process design. Asyncio gives a feeling of concurrency despite using a single thread in a single process. To do so it uses coroutines (which are small code snippets) that can be scheduled concurrently, and switch between them. Asyncio uses generators and coroutines to pause and resume tasks. Let’s now add the ability to run this using the Asyncio library:
The keyword await passes function control back to the main call, It basically suspends the execution of the surrounding coroutine. If Python encounters an await bake_a_cake expression in the scope of make_a_cake, this is how await tells the main call, “Suspend execution of make_a_cake until the result of bake_a_cake returns. In the meantime, go do something else.” We can run this in the following manner:
We get the following results
We can see that the performances are about the same as multithreading, around 10 seconds. Using Asyncio however will yield better results, when running a lot of tasks since the multithreading mechanism is limited by the operating system while Asyncio can be interrupted as many times as needed.
The other thing is that using ‘await’ makes it visible where the schedule points are. This has a major advantage over threading. It makes it easier to understand about race conditions. Which are less frequent than threading, since we are using a single thread. It’s important to understand that concurrency problems are not gone completely. we can not simply ignore other concurrent tasks. With the Asyncio approach, it is much a less common situation when you need to use locks. But we should understand that every await call breaks the critical section.
So does that mean that using asyncio is always better?
NO! using both threading and Asyncio is better for IO-bound tasks, but it’s not the case for CPU bound tasks. A CPU bound task is for example a task that performs an arithmetic calculation. It is CPU bound because the rate at which the process progresses is limited by the speed of the CPU. Trying to use multiple threads won’t speed up the execution. On the contrary, it might degrade overall performance. But we can try to use processes for that since every process is running on a different CPU we are basically adding more computer power to our calculation. Each Python process gets its own Python interpreter and memory space so the GIL won’t be a problem. Let’s change our baking method to do some calculations
And let’s add the ability to tun this using multiple processes
Now we can this and compare it with a sequential run and with multithreading
In this example, we can see that we get about the same results when baking the cakes in a sequence and using multithreading. So even though it is using a concurrency mechanism, we don’t have any waiting time which the process can optimize. It can sometimes be even slower than running it in sequence because of context switching.
Multiprocessing does make the performance better, But using multiple processes is heavier than multiple threads, so we should keep in mind that this could become a scaling bottleneck. Processes also do not share memory because they run on a different CPU.
So ok now with all these options, when should we use each one?
From a performance perspective:
For CPU bound tasks — processes
For IO-bound tasks:
For a few tasks — threads
For a lot of tasks — asyncio
If we are looking at other aspects too, like code readability and quality, I would always use Asyncio over threads. Because using await makes it much more clear and has less room for concurrency errors. Even running multiple threads can have slightly better performance results.