How we optimized service performance using the Python Quart ASGI framework, and reduced costs by 90%

Published in

Super.com

9 min readNov 17, 2020

At Snaptravel, we have built a search engine that handles over 1000 reqs/sec, ingests over 1TB/day of data and processes over $1MM/day in sales while maintaining a 99.9+% uptime. Every request to our search engine requires us to make 40+ network calls to third party APIs and consolidate the results before applying business logic.

Initially, all of are services were implemented using AWS lambda and later we decided to migrate all our high QPS (queries per second) services to python’s asyncio module. In this article, we will cover how the migration to asyncio helped us decrease cost by 90% and optimize performance, cover some basics on building an asyncio application in python and the lessons we learned along the way.

Overview of Asyncio

Before we talk about asyncio, let’s discuss what concurrency is and the problems it helps to solve.

Concurrency is the ability for a program to be decomposed into parts that run independently of each other and in parallel. Concurrency is helpful to solve 2 types of problems: CPU bound and IO bound problems. CPU bound programs spend most of their time processing data whereas an IO bound program spends most of the time interacting with slow operations like network connection, file system etc and speeding it up involves overlapping the time spent waiting for the devices. Since our system requires us to make >40 network calls with every request, it’s IO bound and asyncio allows us to solve the IO bound problem.

For the purpose of this article, we will mainly talk about 2 concurrency methods: asyncio and threading.

Why Asyncio over threading?

CPU Context Switching: Asyncio allows to have application controlled context switching while waiting for IO whereas for threading, CPU has an overhead for context switching. This is important because in asyncio the tasks never give up control or get interrupted in the middle of an operation and thus, allows for sharing of resources relatively easier and scaling more elegantly.
Race Conditions: Since asyncio only switches on the defined points, it’s more immune to race conditions but in threading tasks gets interrupted at any point so data needs to be thread-safe.
With asyncio, there is no blocking on network traffic, disk reads etc which allows having long-running connections with a very little performance impact.

Working with Asyncio

Asyncio module provides a framework to write single-threaded concurrent code using coroutines, event loop, futures and tasks. There are two important keywords when writing async code: async and await. The async keyword goes before the function to show that a method is asynchronous and await keyword is used to schedule the event for execution and wait for it to complete.

Let’s take a look at an example, in the case of asyncio, the output will have interleaved print statements because when an async function yields control because of a blocking code (IO, sleep, redis call etc), it steps off the loop such that some other task can run in the meanwhile. In the following example, asyncio.sleep() is a non-blocking call which allows the surrounding function to temporarily give control to another function that is available to perform some other operation. If we had to run the same code synchronously, it would take ~3 sec whereas asynchronous code takes ~1 sec to run.

Output:

Two
Three
One
Finished sleeping
Finished sleeping
Finished sleeping
Total elapsed time: 1.0041160583496094

Core Concepts:

Coroutines: Coroutines are asynchronous functions that return an object either a result or raise an exception and are declared with async/await syntax. Each coroutine if blocked on a network call can suspend temporarily and give control to the program to allow some other coroutine to run.
Event Loop: An event loop is a loop that can run asynchronous coroutines and tasks, perform network IO operations and run subprocesses. The event loop is used to register, execute and cancel calls and delegate costly function calls to a pool of threads. It also helps to keep track of active coroutines and when one coroutine releases control when blocked, the event loop helps to pass control to some other coroutine.
Futures and Tasks: A future is a low-level awaitable object that represents an eventual result of an asynchronous operation and allows to schedule multiple coroutines that can run at the same time. A task is a subclass of the future that wraps a coroutine.
ensure_future/create_task schedule the coroutine to run on an event loop.

Lets consider the following 2 examples on how coroutines and event loop are defined and the time it takes to execute both:

Output:

One
Two
Three
Total elapsed time: 4.011263847351074

The problem with the above example is that the execution of second coroutine call() doesn’t start until the first one is finished which defeats the purpose of using coroutine and asyncio in general.

If we run the above example with use of futures or task then the same program takes ~2 sec to execute which is the maximum time of the delay across all tasks. This happens because future allows to schedule multiple coroutines at the same time.

Output:

One
Three
Two
Total elapsed time: 2.004169225692749

Asyncio in Python web applications

Quart is a python ASGI web framework which provides the easiest way to use asyncio functionality especially with existing Flask apps. Quart API matches the Flask API exactly so easy transition. This is as simple as replacing flask with Quart and then adding async and await keywords. We had some applications implemented using Flask and others using lambda so we decided to go with Quart framework because of the ease of migration.

How migration from lambda to Quart helped with performance and cost reduction

Helped reduce latency for some of the high QPS endpoints.

Helped reduce infrastructure cost by >90%. With lambda we were spending ~$1000/day whereas with migration to asyncio we spend ~$50/day and the cost is based on the number of EC2 instances used. We have been able to further optimize infrastructure cost by using a combination of on-demand and spot-instances on AWS for asyncio.
Our search engine makes 40+ parallel network calls per request but with lambda we were experiencing performance issues as we scaled our service to make 10+ network calls, resulting into increase in latency, lambda invocations and thus increasing the infrastructure cost.
Lambda had a limitation of 6MB for invocation payload whereas with migration to asyncio we didn’t have any such data size limitation.

Upgrading from Flask to Quart

Helped reduce latency by 2x without requiring major changes in the code
Migrating from flask app to quart app, helped us scale traffic from ~150 requests per second to 300+ requests per second with an improvement in error rate by 95% during peak hours. The flask app wasn’t able to efficiently scale and would often require restarting of the app during traffic spike.

Lessons learnt when working with asyncio

Asyncio results in a significant improvement for IO bound applications, but not for CPU bound. CPU bound operations limit the effectiveness of asyncio due to occupying a significant portion of the event loop.
Scale with more processes or with connected microservices
In order to benefit from asyncio, the whole stack needs to be asyncio aware and running on the event loop. Any implementation in the application that is not asyncio or does network call will block all other connections while making network call. For example, using requests library in python instead of using requests-async.
Lets take a look at an example where blocking code was implemented synchronously vs asynchronously and analyze the response time for each:

Output: For this example, time.sleep() is a blocking call and called in a synchronous function, this stops everything for the duration of the sleep time. The execution time ends up being ~15 sec and the code is running synchronously as seen in the following output.

Calling main function:
One
Finished sleeping
Running coroutine
Two
Finished sleeping
Running coroutine
Three
Finished sleeping
Running coroutine
End of main function
Total elapsed time: 15.00484681129455

Now changing the time.sleep() blocking call to a non-blocking call:

Output: In comparison to the previous example, the call function is an async function with asyncio.sleep() which is a non-blocking call and allows to temporarily give control to some other function. The execution time has gone down from 15 sec to 5 sec. Thus, it’s important to understand any blocking code implemented synchronously will result into blocking all other connections.

Calling main function:
One
Two
Three
Finished sleeping
Running coroutine
Finished sleeping
Running coroutine
Finished sleeping
Running coroutine
End of main function
Total elapsed time: 5.004288911819458

Debugging is relatively harder than synchronous code because pdb statements do not skip over await calls properly and there is no way to manually execute a task in an event loop because pdb would cause event loop to halt.
A few of the ways to make debugging easier: use aioconsole which allows to have a python prompt for event loop or set PYTHONASYNCIODEBUG=1 when running async code or set the event loop debug level to true.

 pip3 install aioconsole
 apython to open a console with running asyncio event loop OR PYTHONASYNCIODEBUG=1 python3 test_asyncio.py OR loop = asyncio.get_event_loop()
 loop.set_debug = True
 # logs coroutines taking more than 0.001 seconds to be executed
 loop.slow_callback_duration = 0.001
 loop.run_until_complete(main())

Latency profiling or analyzing blocking latency for coroutines is hard when the changes introduced have more blocking work onto an event loop which can result in coroutines clogging up and waiting on resources for a longer duration. Coroutines waits for the max timeout before throwing an error if the underlying network call doesn’t return. It’s generally a good practice to specify a timeout for future tasks to avoid waiting until the max time.

There might be some memory leak in python asyncio/quart library so we are using graceful-timeout (ie. the time workers has to finish up its request when it is being killed) and max-requests (the maximum number of requests a worker will process before restarting)

gunicorn -w 20 -k uvicorn.workers.UvicornWorker app:app -t 45 --graceful-timeout 45 --max-requests 10000

We needed additional monitoring for our asyncio application and it required us to monitor event loop blocking latency and the number of tasks scheduled. This helps to debug the degraded service performance and analyze if we need to scale the service or one of the network connection is causing the service to degrade. Here is the article that we found helpful.

Sample to use the monitoring utils in async app:

Libraries that we use for asyncio in python:

Quart: an asyncio alternative to Flask. Quart is a python ASGI web microframework
Uvicorn: an async alternative to gunicorn. ASGI instead of WSGI
requests-async: async functionality for requests library. The work has been taken over by httpx now but both work fine.
aredis: async redis client ported from redis-py. One of the limitation is not being able to specify socket_timeout but we can alternatively use stream_timeout to achieve the same.
aioredis: We no longer use it but the redis connection is established by scheduling a coroutine at the run time. The drawback with our use was if the primary node fails then it wasn’t able to reconnect to the replica without restarting the server.
List of other libraries for asyncio: https://github.com/python/asyncio/wiki/ThirdParty

Summary

If you are looking to build a high QPS service in python then asyncio is a great way to achieve concurrency for programs performing high IO bound tasks. Using asyncio allowed us to handle thousands of requests times dozens of network calls which would have otherwise created hundreds of thousands of parallel threads. Thus, migrating our services from lambda or Flask to Quart helped us efficiently scale our search engine while minimizing infrastructure cost.

Since asyncio in python is fairly new, the API is continuously changing which makes it slightly harder. Only use asyncio if you really need the performance gain and understand the tradeoffs.

Interested in working with us to tackle these kinds of challenges? We’re hiring. Check out some of our open opportunities:

Snaptravel Careers

Backend Software Engineer

Senior Backend Software Engineer