asyncio — the underrated weapon for machine learning

A practical guide to using asyncio in machine learning applications

7 min readMay 21, 2022

Over the last few years, I have spent significant time deploying machine learning applications to production. When I was an active participant in Kaggle in 2017, little did i care about model complexity or model performance e.g., latency and throughput and so on. I would, then, do a lot of data preparation and feature engineering, try out state of the art models, perform rigorous hyperparameter tuning and try out fancy augmentation techniques etc. The only goal for me was to create the most accurate model. Don’t get me wrong, those are critical parts of a machine learning project and often times, you can emerge with a better model if you know your stuff and perform the above steps well. However, that’s not where an ML project usually ends in real life.

When you are working in an organisation — your goal is to not only create model that is accurate but also let other stakeholders leverage that model for building applications on their side. These stakeholders could be either direct customers or internal users who are working on automation or any other project where ML could add business value. To share your trained model with others, you would then have to deploy the model, which often means hosting the model in the cloud whereas people can send request to that model and obtain predictions.

Whether you are at the developer end whereas you are writing the code for model serving or you are at the consumer side, writing scripts to integrate a third-party service into your application, you could benefit immensely if you understand the usage of asyncio in python. In the following section, we shall go over the basic concepts of asynchronous programming in python while introducing the fantastic asyncio package and its functionality to you.

I have divided this article into three parts for you to follow it easily without getting overwhelmed by a lot of details, all at once. In this part, I shall mention the typical use-cases where you should think of adopting asynchronous programming in general and I will wind up with an example demonstrating the benefit of asyncio over some parallel programming techniques e.g., multi-processing or threading.

In part 2, I will introduce you to the fundamental concepts of asynchronous programming, explain what a coroutine is and go over some most common functionality of the pythonasyncio package.

In part 3, I will talk about async in context to ML model deployment. In particular, I will walk through a typical benefit of asynchronous programming while deploying models through the awesome fastapi framework. In this part, you will finally see why you should really care about async and what it could mean to you when you know the details of async programming in python.

With that out of the way, let’s dive right in. All the code used throughout this blog post is available in my Github repository.

When should you use async

Have you ever had a use-case whereas you were required to download a bunch of images or audio files or text transcripts before you could run your model on them? Alternatively, did you come across a situation where you had to pull a number of files from databases or features stores and merge them prior to making predictions with your model?

The key detail to notice here is the fact that the above tasks could be accomplished in parallel (or at the same time). If you agree to that, the next question you might be asking yourself is how do I go about parallelising these tasks. Fair enough, let’s see our options. For the sake of completeness, let’s also consider the most naìve approach whereas we download these files one at a time.

1. Sequential

In the above code snippet we are downloading ten random images from unsplash using the python package requests and we are also logging the total time it takes to perform the operation. There’s nothing more to it, so I will go ahead and show you the output of the above snippet (the output observed in my case may differ with what you get based on your system configuration and network speed).

Downloading the images sequentially...
image 0 shape: (300, 300)
image 1 shape: (300, 300)
image 2 shape: (300, 300)
image 3 shape: (300, 300)
image 4 shape: (300, 300)
image 5 shape: (300, 300)
image 6 shape: (300, 300)
image 7 shape: (300, 300)
image 8 shape: (300, 300)
image 9 shape: (300, 300)
elapsed: 18.88 seconds

That took a total of 19 seconds to download all the images. That’s not so good. Let’s switch gears and explore the next option for us.

2. Multi-processing

To stick to the topic of this article, I won’t go into the details of multi-processing in python, but on a high level this means spawning a number of different processes on your CPU that can run the computation separately and in parallel. These processes have their own dedicated memory (RAM) and are oblivious to the existence of each other while a parent process manages them on your behalf to collate and collect the output and other information as necessary.

When I run the above, I get the output below:

downloading images using multi-processing...
image 3 shape: (300, 300)
image 2 shape: (300, 300)
image 1 shape: (300, 300)
image 0 shape: (300, 300)
image 4 shape: (300, 300)
image 6 shape: (300, 300)
image 5 shape: (300, 300)
image 7 shape: (300, 300)
image 8 shape: (300, 300)
image 9 shape: (300, 300)
elapsed: 5.67 seconds

Whoa! From 19 seconds our execution time dropped to 6 seconds. That’s more than 3x speed-up. A detail to notice here is that, I am creating four processes for this example. You can play around and try different numbers of processes and check if you could cut down the time even further. Remember though, more processes do not always mean better performance as there is overhead for spawning and orchestrating these processes internally. Alright, Could we do better than this? let’s find out.

Do note that, multi-processing is more suitable for tasks that are heavily dependent on computation e.g., executing your model or running your script for feature engineering. However, above we are merely utilising our CPU, rather we are mostly waiting for all the downloads to finish during which our CPU remains idle. Such tasks are commonly known as I/O bound tasks.

3. Multi-threading

Our third option is to use threads instead of processes. Threads are bit more abstract and harder to explain. Similar to above we shall not go into the weeds of defining or explaining threads. It’s, however, worth noting that unlike processes, threads do not necessarily have separate memory for them and they often share resources amongst them. This makes implementation of multi-threaded programs in python trickier and could easily lead to confusing results if done incorrectly.

For our simple program though, we do not need to worry about such inconsistent behaviours. Hence, let’s proceed with the implementation.

When I run this code, I get the output as shown below.

downloading images using multi-threading...
image 1 shape: (300, 300)
image 2 shape: (300, 300)
image 3 shape: (300, 300)
image 0 shape: (300, 300)
image 6 shape: (300, 300)
image 7 shape: (300, 300)
image 4 shape: (300, 300)
image 5 shape: (300, 300)
image 9 shape: (300, 300)
image 8 shape: (300, 300)
elapsed: 3.88 seconds

That’s 33% additional speed-up over multi-processing. Did you notice, by the way, the ordering of the output? Unlike the sequential execution, multi-processing and threading does not necessarily guarantee the order of output. That’s something to remember while using these techniques in your program.

Okay, I have been building my case so far. Now let’s see what the actual deal is about. Let’s take async for a ride as our final option.

4. asyncio

Above is the equivalent code for downloading those images using asynchronous programming. This time we could not further use the download_random_image function that we have been using in all the other implementations. Reason being, that function uses requests library which does not support asynchronous calls. Instead, we are using another library aiohttp which let us perform the downloads asynchronously.

If you don’t understand the last two statements clearly, that’s okay. Actually, I don’t want you to put too much emphasise on the code itself. Here we’re just demonstrating the use-case and comparing the different approaches. Once you understand the fundamentals of asyncio, these will automatically start making sense.

Alright, let’s run the code above, shall we?

Downloading images using async...
image 1 shape: (300, 300)
image 7 shape: (300, 300)
image 3 shape: (300, 300)
image 4 shape: (300, 300)
image 9 shape: (300, 300)
image 6 shape: (300, 300)
image 2 shape: (300, 300)
image 8 shape: (300, 300)
image 10 shape: (300, 300)
image 5 shape: (300, 300)
elapsed: 1.16 seconds

The hype is real. Still don’t trust me? Try executing the code and see it for yourself. We got a 4x speed-up over threading and a 19x speed-up over the sequential code. That’s insane, isn’t it?. But what makes async programming so much powerful? Oh, by the way, did I mention that the above code is running on a single process and a single thread? 🤯

We shall unravel the secrets behind such performance in the next part of this blog post. Stay tuned.