I hope that’s what led you here!

Optimizing REST API Performance

Jesum
7 min readJul 13, 2023

--

The content of this blog is written assuming you use FastAPI (but some recommendations are not tied to any specific API framework).

Use a faster JSON serializer

In case you are not aware, the standard json serializer module which everyone uses is slow. I’m not going to run my own benchmark tests because there are plenty you can find with a quick Google search.

Why is this important and how does it impact FastAPI? The standard way to respond with a JSON payload in FastAPI is to construct an instance of JSONResponse. This is the default behaviour. However, JSONResponse is slow. If you look into FastAPI’s source code, JSONResponse is really just:

from starlette.responses import JSONResponse as JSONResponse

Which is really using the json Python module to do a json.dumps. You can see it in Starlette’s source code here https://github.com/encode/starlette/blob/d6007d7198c35c1a7ed81e678a81c3bca86bee5e/starlette/responses.py#L185.

A quicker solution is to switch to the excellent orjson module. You can clearly see it here https://github.com/tiangolo/fastapi/blob/f7e3559bd5997f831fb9b02bef9c767a50facbc3/fastapi/responses.py#L29 that it uses orjson.dumps, which is orders of magnitudes (most benchmarks show a 6 folder increase in performance during serialization) faster than json.dumps.

So, make the switch to ORJSONResponse instead! See https://fastapi.tiangolo.com/advanced/custom-response/#use-orjsonresponse for documentation. Using it is really simple as you can just do something like this:

response_object = MyPydanticClass(data=<somedataobject>)
return ORJSONResponse(content=response_object.dict(), status_code=200)

Double Pydantic serialization

Do you use jsonable_encoder (https://fastapi.tiangolo.com/tutorial/encoder/#using-the-jsonable_encoder) to encode all your responses before sending them back to the caller?

If you can help it — don’t. You should really be using Pydantic models generously in your sources. Use them to define your API response models as well (see https://fastapi.tiangolo.com/tutorial/response-model/ for examples). In fact, if possible, use a standardized Pydantic model to define your API responses. Your front-end developers will thank you for that. We use https://google.github.io/styleguide/jsoncstyleguide.xml. As an example, our base response object is declared like this:

class BaseDataObject(BaseModel):
kind: Optional[str] = Field(title="Kind", description="Indicates type of resource requested.", example="device")
fields: Optional[list] = Field(title="Fields", description="Provides a list of fields included in a single-response request.", example="")
id: Optional[str] = Field(title="Id", description="A globally unique string used to reference the object.", example="")
# pagination fields, applicable for collection responses
itemsPerPage: Optional[int] = Field(title="Page Size", description="Number of results per page.", example="100")
startIndex: Optional[int] = Field(title="Page Index", description="Page number.", example="1")
totalItems: Optional[int] = Field(title="Total Items", description="Total number of results", example="")
items: Optional[list] = Field(title="Items", description="A list of the requested collection.", example="")

class BaseResponseObject(BaseModel):
data: Optional[BaseDataObject]

So, what’s this gotta do with performance? If you are already composing your responses as a Pydantic model, please do not call jsonable_encoder to convert it into a JSON. That just causes another round of Pydantic serialization to occur. You are much better off just doing something like this:

data_object = MyBeautifulPydanticList(
itemsPerPage=request_object.page_size,
startIndex=request_object.page_num,
totalItems=total_count,
items=[]
)

for item in results:
try:
data_point = MyBeautifulPydanticItem(**item)
except Exception as e:
logging.error(f"Error parsing data returned: {str(e)}")
continue
data_object.items.append(data_point)

response_object = MyBeautifulPydanticResponse(data=data_object)
return ORJSONResponse(content=response_object.dict(), status_code=200)

See? No calls to jsonable_encoder is needed, a simple call to Pydantic’s dict() function is sufficient — then wrap it all up using ORJSONResponse, which will serialize it nicely. Here’s some sample code to illustrate what happen:

from pydantic import BaseModel
from datetime import datetime
from fastapi.responses import ORJSONResponse, JSONResponse
from fastapi.encoders import jsonable_encoder
import time
from typing import List

class MyBeautifulPydanticModel(BaseModel):
birthday: datetime
married: bool
name: str
weight_kg: int

class MyList(BaseModel):
items: List[MyBeautifulPydanticModel]

one_item = MyBeautifulPydanticModel(birthday = datetime.now(), married = False, name = "Mad Dude", weight_kg = 5000)
a = MyList(items = [one_item] * 88000)

start_time = time.time()
response_object_fast = ORJSONResponse(a.dict(), status_code=200)
end_time = time.time()
print(f"response_object_fast: {end_time - start_time}")

start_time = time.time()
response_object_classic = JSONResponse(jsonable_encoder(a), status_code=200)
end_time = time.time()
print(f"response_object_classic: {end_time - start_time}")

and the output I get is:

response_object_fast: 7.745699882507324
response_object_classic: 19.436201095581055

That’s already a 2.59x speed increase!

GZip Middleware

This is a no-brainer. Implement the GZip Middleware — https://fastapi.tiangolo.com/advanced/middleware/#gzipmiddleware. There’s example code and it’s literally as simple as:

app = FastAPI()
app.add_middleware(GZipMiddleware, minimum_size=1000)

Tweak minimum_size to your requirements. Too high a number and you may increase the HTTP stream size, while too low a number and you may increase your runtime memory / CPU requirements with little gains in compression. Remember: JSON is essentially text, and text compresses really well with GZip.

OpenTelemetry Instrumentation

Another no-brainer decision. Want to know which Python function is the slowest? Please, please use OpenTelemetry Instrumentation. You can read more about this in my other 3-part blog here.

REDIS Caching

Where possible, and where it makes sense, use a cache. REDIS is nice because it’s blazingly fast (since it literally runs as RAM speeds).

In Human Managed, we built a cache wrapper class that is powered by REDIS in the background. > 99% of our REST APIs use this class. Furthermore, because each API is intimate with the data that it serves, the API code can specify a suitable TTL for the REDIS keys.

The REDIS cache is completely transparent to the API code. For example, we have a backend database powered by Snowflake. Access to the database is wrapped around a custom written RDBMS class. This custom class implements the REDIS cache wrapper internally initialized with sane defaults (e.g. REDIS key TTLs). Whenever our API code calls the RDBMS class, checks are made against REDIS for cached results.

One thing to note: we don’t cache based on the output provided by Snowflake. This is a waste of resources. The cache-key is a 128-bit hash value calculated based on the SQL statement (and its appropriate bind parameters).

When de-serializing data from REDIS, avoid using the slow and expensive ast.literal_eval function. Our REDIS wrapper class does not do any translation of data — that is delegated to the caller i.e. the RDBMS wrapper class, which does this:

if (cached_data is not None):
logging.info(f"SQL query results found in cache for: {sql_string}. Retrieving results.")
try:
#rdbms_records = ast.literal_eval(cached_data)
rdbms_records = json.loads(cached_data)
except Exception as e:
logging.error(f"Error parsing cached document. {str(e)}")
self._cache_client.remove_object(hash_digest)
rdbms_records = None
logging.info(f"Results retrieval function completed.")

We have other data access wrapper class — for example, one for Cube.dev, one for Air Table, one for MongoDB, etc. All of them decide how data is encoded and decoded when interfacing with the REDIS wrapper class.

Upgrade to Pydantic v2 and FastAPI v0.100.0

In case you missed it, Pydantic v2 was completely rewritten in Rust and it now insanely fast (between 5x to 50x faster). FastAPI starting with v0.100.0 supports Pydantic v2.

The author of FastAPI has also said that in a future release, if a response model is defined, data will be serialized using the Rust version of Pydantic , which means we will enjoy the speedboost out-of-the-box 🚀!!

Lazy Loading

I’m not going to go into the intricacies of lazy loading on front-end technologies. This optimization point only applies to front-ends that render responses pulled from REST APIs.

I am going to say this: the simplest strategy to support this is to use server-side paging. In your FastAPI code, implement standardized parameters for page number and page size. Make them available across your slower running APIs. This allows the front-end developer to use simple paging to implement lazy loading, which can give the impression of improved UI response to the end-user. Generally speaking, any solution that allows you to retrieve data in consecutive logical sets (well, pages happen to be one such logical set) will fit the bill.

For example:

  1. on page load, call the API /getdata?pagenum=1&pagesize=100
  2. after step 1 completes, call the API /getdata?pagenum=2&pagesize=100 and append the data to your UI element
  3. after step 2 completes, call the API /getdata?pagenum=3&pagesize=100 and append the data to your UI element
  4. etc.

The only thing I would caution here is that pagenumand pagesize must be applied with whatever filters and groupings which you apply to the data retrieval. Otherwise, your data is going to look wrong to the user. For example, each of the steps above should be doing something along the lines of:

SELECT * FROM TABLE WHERE AGE > 5 LIMIT 100 OFFSET 0
SELECT * FROM TABLE WHERE AGE > 5 LIMIT 100 OFFSET 100
SELECT * FROM TABLE WHERE AGE > 5 LIMIT 100 OFFSET 200
etc.

Always keep the same filter criteria of AGE > 5.

--

--