Python’s Concurrency Model

What are the differences between asyncio, threading, and multiprocessing?

Published in

Hashmap, an NTT DATA Company

5 min readJan 14, 2022

What is concurrency and how can it be used to speed up your Python programs? In this blog post, I’ll start by explaining the differences between concurrency and parallelism. From there I will break down the different Python concurrency methods available (asyncio, threading, and multiprocessing) and the tradeoffs with each approach.

So let’s get started!

Concurrency vs Parallelism

Concurrency is when an application that's processing can perform multiple tasks in an overlapping time period. It doesn’t necessarily mean that all these tasks will ever be running at the same instant.

Parallelism on the other hand is when an application can divide a task into multiple subtasks and literally performs them in parallel often using multiple cores.

Confused? Let’s understand this with a real-world analogy:

Let us say as a CEO of a company you want to get two tasks done for an upcoming conference:
1. Fly to Atlanta from Houston for a conference

2. Prepare a presentation

You can either assign both these tasks to one employee — ask him to fly to Atlanta and also prepare the presentation while he is waiting for the flight at the airport or on the flight itself. That's the concurrent way of getting these tasks done.

Your other option is to assign both tasks to different employees and get them done in parallel. That's the parallel way of getting it done.

In programming terms, concurrency can be achieved by multitasking on a single-core machine. It is often achieved using scheduling algorithms that divide the CPU’s time.

However, parallelism is performing multiple tasks in parallel on a multi-core (or multiple CPUs on a single motherboard) machine.

Now that you have an idea of what concurrency is and how it’s different from parallelism, I’ll take you through the various available options to achieve the same in Python.

1. asyncio Coroutines

aysncio is a library that comes with Python. With it, you can get more work done by not waiting for individual tasks/coroutines that can be run independently.

When a program is running IO-bound code, most of the time the CPU spends a lot of time doing nothing at all because the one thing that’s currently being done is waiting for something elsewhere. Asyncio allows you to write your code in such a way that when one piece of code is waiting for something to happen, another task/coroutine can use the CPU to get something else done.

Only one coroutine runs at a time. This means Asyncio is not a parallel model — it’s a concurrency model. The minimum schedulable unit (the task which can be run independently) in this model is an “awaitable” block.

One important concept to understand in this model is how the state is shared among all tasks. Here you no longer only have one stack per thread. Instead, each thread has an object called an event loop. The event loop maintains a list of tasks with each task having its own stack. The event loop here can only have one task running at any point of time until it reaches a point where the task has to wait, and in that case, it yields to another task. The event loop in that case invokes any other sleeping tasks. In this way, the global state is shared and is consistent within a task because at any point in time only the task is executing.

2. Python Threads

Python threads are a common way of getting tasks done in a concurrent way. Although by threads, you can have multiple processors running your program, each one doing some task independently at a time. However, at any point in time, only one Python thread will be running. There are other ways to get multiple tasks done simultaneously but they all come with their own complexities.

The global state is shared but it is consistent only for a single operation of the interpreted code.

The tasks which spend a lot of time waiting are likely good candidates for this model. However, getting this model right in a highly concurrent environment can become very hard and tricky. Python has some abstractions like ThreadPoolExecutor which helps in managing groups of threads, but you can still encounter scenarios like Deadlocks or Race Conditions if the code is not written properly with these scenarios in mind. Python has some more abstractions like Locks and RLocks to help, but it still needs some careful work to get it right.

3. Multiprocessing

This model is pretty similar to the threading model, with the exception of using subprocesses instead of threads. This model allows us to run Python code in parallel on multiple processors simultaneously.

Handling the global state is comparatively easier with this model, as whenever a parent process spawns a child process, the entire state of the parent process is copied over with the child process. The changes that occur to the state due to external factors to the child process are not visible to the child process. Vice versa is also true — where the changes made by the child process are not visible to the parent process. This makes it very easy and safe to use.

Conclusion

Hopefully, with this article, you have gained an understanding of parallelism and concurrency. I also hope the different available Python concurrency options I covered gave you a good understanding of the tradeoffs with each approach.

Additional Resources

Getting Started with Snowflake Using Python

A step by step guide

medium.com

5 Steps to Converting Python Jobs to PySpark

Moving from Pandas to PySpark using Apache Arrow or Koalas

medium.com

Securely Using Snowflake’s Python Connector within an Azure Function

Why and How to use Key Vault

medium.com

Hashmap Megabytes | Bite-Size Video Series

Hashmap Megabytes is a video series in which mega cloud ideas are explained in bite-size portions. Our data and cloud…

www.hashmapinc.com

Ready to Accelerate Your Digital Transformation?

At Hashmap, an NTT DATA Company, we work with our clients to build better, together. We are partnering with companies across a diverse range of industries to solve the toughest data challenges — we can help you shorten time to value!

We offer a range of enablement workshops and assessment services, data modernization and migration services, and consulting service packages for building new data products as part of our service offerings. We would be glad to work through your specific requirements. Connect with us here.

Jetinder Singh is a Senior Software engineer for Hashmap, an NTT DATA Company, working across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers.

Python’s Concurrency Model

What are the differences between asyncio, threading, and multiprocessing?

Concurrency vs Parallelism

1. asyncio Coroutines

2. Python Threads

3. Multiprocessing

Conclusion

Additional Resources

Getting Started with Snowflake Using Python

A step by step guide

5 Steps to Converting Python Jobs to PySpark

Moving from Pandas to PySpark using Apache Arrow or Koalas

Securely Using Snowflake’s Python Connector within an Azure Function

Why and How to use Key Vault

Hashmap Megabytes | Bite-Size Video Series

Hashmap Megabytes is a video series in which mega cloud ideas are explained in bite-size portions. Our data and cloud…

Ready to Accelerate Your Digital Transformation?

Written by Jetinder Rathore