Photo by Matt Bero on Unsplash

When to define our Endpoint Sync or Async in FastAPI

Internal Working on how FastAPI handles sync & async endpoint requests

--

FastAPI is an extensively popular web application development framework in the Python community. At work, we’ve implemented most of our web services using FastAPI. Recently, we were tasked to optimize a FastAPI endpoint for latency. As part of that, we did some performance benchmarking to identify the latency for sync vs async implementation of the same endpoint. We did some further reading to identify the internal workings of how FastAPI and Uvicorn handle requests for sync & async endpoints.

We decided to collate our findings and conclusions into this blog post for two reasons:

  • To help someone understand how FastAPI request handling varies for sync and async endpoint.
  • To help someone (working with FastAPI) to decide for a sync or async implementation of their endpoint.

We’ll use a working toy example throughout to demonstrate the benchmarking numbers and conclusions.

Note: This blog post can be read by everyone but some sections are a bit more technical therefore to get the maximum out of it, it’s good to have a basic understanding of threading, asynchronous programming and FastAPI. There are some excellent resources available. This one is my personal favorite.

With context set in, without further ado, let’s get started.

Let’s consider the below FastAPI application with two endpoints having more or less the same implementation:

# app.py
import time
import asyncio
from fastapi import FastAPI

app = FastAPI()

@app.get("/sync_endpoint")
def sync_endpoint():
print('Inside Endpoint')
time.sleep(0.3)
return {"message": "Hello, FastAPI!"}

@app.get("/async_endpoint")
async def async_endpoint():
print('Inside Endpoint')
await asyncio.sleep(0.3)
return {"message": "Hello, FastAPI!"}

# Run the app using uvicorn if executed directly
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)

The two endpoints having the same implementation are defined. Both print a message first, sleep for 0.3 seconds and then return the response. sync endpoint sleeps via blocking call whereas the async sleeps via non-blocking call.
In general, whenever there is an IO-bound task (fetching data from another service, database connection etc) which is simulated using sleep in our example app, then async tends to do better.

The reason for that is, in sync code, the entire thread is blocked until the operation completes. During this time, no other operation can be executed. Whereas in async code, a single thread can manage multiple tasks concurrently by “switching” between tasks whenever one is waiting for I/O. This switching happens quickly and efficiently without the overhead of thread management.

Let’s write a simple script to hit our endpoints with concurrent requests and benchmark the latency for each.

import sys
import httpx
import asyncio
import time
import itertools

async def send_request(client, url):
try:
response = await client.get(url)
return response.status_code, response.text
except Exception as e:
print(f'Request error occured: {e}')
return None, None

async def send_concurrent_requests(url, num_requests):
limits = httpx.Limits(max_connections=200, max_keepalive_connections=200)
async with httpx.AsyncClient(limits=limits, timeout=None) as client:
responses = await asyncio.gather(
*[send_request(client, url) for _ in range(num_requests)])
return responses

if __name__ == "__main__":
endpoint, num_request = sys.argv[1], int(sys.argv[2])
start_time = time.time()
responses = asyncio.run(
send_concurrent_requests(
f'http://localhost:8000/{endpoint}',
num_request
)
)
end_time = time.time()
print(f'Completed {num_request} requests in
{end_time - start_time:.2f} seconds')
successful = [res for res in responses if res[0] == 200]
print(f'Successful Responses: {len(successful)},
Failed Responses: {num_request - len(successful)}')

The script hits the providedendpoint with num_request concurrent user requests. Below are the benchmark results when we execute the script for different number of concurrent requests:

Sync Endpoint Benchmark Results:
Completed 1000 requests in 29.42 seconds
Completed 1500 requests in 58.71 seconds

Async Endpoint Benchmark Results:
Completed 1000 requests in 30.27 seconds
Completed 1500 requests in 62.13 seconds

As we can see, time taken by sync and async endpoints are in the same ballpark. In fact sync has some edge which is a bit counter-intuitive to our earlier explanation. So why did this happen? To answer that, we need to understand how FastAPI internally handles sync and async endpoint requests.

Request handling when the endpoint is synchronously defined

  • When a request comes in for a synchronous endpoint, Uvicorn detects it and assigns it to one of the threads in the thread pool.
  • The assigned thread will execute the synchronous code. While this thread is busy, it can’t handle any other requests.
  • Once the thread finishes processing the request, the response is sent back to the client, and the thread is returned to the pool, ready to handle another request
  • Therefore, the number of concurrent synchronous requests our application can handle is limited by the size of the thread pool (default size of 40). If all the threads are busy processing requests, new requests will have to wait until a thread becomes available.

Request handling when the endpoint is asynchronously defined

  • When a request arrives at an async endpoint, Uvicorn detects it and it is assigned to the event loop, which manages the execution of the coroutine (async function). The event loop can handle multiple async requests concurrently without blocking.
  • When an async endpoint reaches an awaitstatement (e.g. waiting for an IO operation), the event loop can switch to handling another request. This allows multiple requests to be processed in a seemingly parallel fashion, even though they are running on a single thread. Unlike synchronous endpoints, Uvicorn does not spawn new threads for each request.

To further dig deep, apart from the official documentation, you can refer to this answer.

Therefore, the blocking time (simulated by sleep) in the synchronous endpoint is compensated by the larger thread pool size because of which we end up having similar times in both implementations.

This means, that if the implementation gets more IO intensive (waiting time is increased further), it will cause longer blocking times of threads which will in turn cause the latency of the sync endpoint to be higher. To corroborate this, let’s increase the sleep time from 0.3 seconds to 1 second in both implementations to simulate a longer IO-bound task.

# app.py
# everything remains same as before, just the sleep time is increased

# -- same code
@app.get("/sync_endpoint")
def sync_endpoint():
print('Inside Endpoint')
time.sleep(1)
return {"message": "Hello, FastAPI!"}

@app.get("/async_endpoint")
async def async_endpoint():
print('Inside Endpoint')
await asyncio.sleep(1)
return {"message": "Hello, FastAPI!"}

# -- same code

Now, let’s reuse our benchmarking script to get the latencies for both endpoints by hitting some concurrent requests to them.

Sync Endpoint Benchmark Results:
Completed 1000 requests in 41.32 seconds
Completed 1500 requests in 78.12 seconds

Async Endpoint Benchmark Results:
Completed 1000 requests in 34.37 seconds
Completed 1500 requests in 67.69 seconds

As we can see, the async implementation has started to show much lower latencies. This difference will be magnified further as more and more IO-bound tasks are added to the implementation. Therefore, If the endpoint implementation is IO task intensive, it’s highly recommended to go for async implementation as opposed to sync one.

Sidenote on CPU and Memory consumption

Memory consumption was higher in sync implementation compared to async one which makes sense given more threads are spawned and maintained along with the thread switching overhead. CPU throughput was less in sync endpoint as in, it wasn’t peaking much compared to async one which also makes sense given the threads would be blocked by IO-bound task in sync while the event loop will switch to a new request as the current request is awaited.

Conclusions

  • For short waiting requests, async & sync are almost the same regardless of the concurrency size.
  • For relatively long waiting requests (when the endpoint is more IO task intensive), async is preferred since the event loop won’t be blocked by the IO blocking time as opposed to threads being blocked in sync implementation.

--

--

Rachit Tayal
Python Features

Sports Enthusiast | Senior Deep Learning Engineer. Python Blogger @ medium. Background in Machine Learning & Python. Linux and Vim Fan