Profiling ORM in FastAPI

Isaac Vidas
IndigoAg.Digital
Published in
20 min readSep 24, 2020

At Indigo Ag, we love Python and we also love creating fast APIs.

This is why for one of our projects, we decided to use FastAPI, an open source Python web framework for building… you guessed it, fast APIs!

It also enables us to write production-ready APIs that benefit from Python type hints for faster development and fewer bugs.

You can read more about FastAPI in the documentation.

FastAPI ❤ SQLAlchemy

FastAPI has an excellent integration with SQLAlchemy for working with relational databases.

SQLAlchemy enables us to use the Object Relational Mapper (ORM) pattern and create Python models and objects that will represent our database tables and entities.

The ORM converts/maps between the objects that are defined in the code and the database tables which helps to make the code more readable, maintainable and reusable.

For more information about how to use SQLAlchemy in FastAPI, please refer to this documentation.

The N+1 problem

While ORM has a lot of benefits, there’s also a cost.

While the ORM enables us to write most of our code in Python, it requires us to add another layer between our code and the database.

This can get a bit complicated when we aren’t aware of the internal mechanism of how, when and in what way the queries we define in our code are actually executed in the database.

One of the most common problems with ORM is the N+1 problem.

The N+1 problem can occur when we work with tables that have foreign keys to other tables. When we iterate over the related objects of an object we queried for, additional queries or select statements might be executed in the background.

This can result in a large amount of additional transactions which can reduce performance and increase the load on our database.

In this blog post we aim to demonstrate the N+1 problem in a FastAPI application and how we can use a profiler to solve it.

The code for the project can be found on GitHub.

Alright, enough with introductions, let’s get to it!

Our demo project

The code in this blog post can be found in GitHub.

The demo app we will use is a simplified example from the FastAPI SQL (Relational) Databases tutorial.

Our version will be much smaller and we will support only one request type for returning all users and their items from our system.

You can follow the FastAPI tutorial or clone the web application from the GitHub repository.

High level architecture

Our web application will have the following components, as shown in the image above:

  1. FastAPI web server that serves a GET request for /users for the following entities: Users and Items
  2. SQLite database with two tables:
    Users — each user can have multiple items
    Items — each item has a user
  3. The server’s user and item entities are mapped to their corresponding tables in the SQLite database through SQLAlchemy’s ORM

Required packages

Our demo app will be using the following packages, so make sure that you install them in your Virtualenv / Conda env:

You can find the list of required packages in requirements.txt file:

fastapi==0.61.0
uvicorn==0.11.8
Sqlalchemy==1.3.19
sqltap==0.3.11

Source files

We will have the following files under our source folder:

.
├── README.md
├── app.db
├── fastapi_orm_example
│ ├── __init__.py
│ ├── app.py
│ ├── database.py
│ ├── models.py
│ └── schemas.py
├── report.txt
├── requirements.txt
└── run_server.sh

Database.py

In this file we define the database that we’re going to use in our app:

from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

SQLALCHEMY_DATABASE_URL = "sqlite:///./app.db"

engine = create_engine(
SQLALCHEMY_DATABASE_URL, connect_args={"check_same_thread": False}
)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

Base = declarative_base()

Models.py

In the models.py file, we define the classes that represent the tables in our database.

As described, we will have two main entities which are User and Item.

We define a relationship between User and Item — a user can have multiple items, and each item has one user.

from sqlalchemy import Boolean, Column, ForeignKey, Integer, String
from sqlalchemy.orm import relationship
from .database import Base


class User(Base):
__tablename__ = "users"

id = Column(Integer, primary_key=True, index=True)
email = Column(String, unique=True, index=True)
hashed_password = Column(String)
is_active = Column(Boolean, default=True)

items = relationship("Item", backref="owner")


class Item(Base):
__tablename__ = "items"

id = Column(Integer, primary_key=True, index=True)
title = Column(String, index=True)
description = Column(String, index=True)
owner_id = Column(Integer, ForeignKey("users.id"))

Schemas.py

This file will describe our Pydantic models:

from typing import List
from pydantic import BaseModel


class Item(BaseModel):
id: int
title: str
owner_id: int
description: str = None

class Config:
orm_mode = True


class User(BaseModel):
id: int
email: str
is_active: bool
# items: List[Item] = [] # We will add this later

class Config:
orm_mode = True

App.py

Our main application file will create the app session and define the API call functions:

from typing import List

import sqltap
import uvicorn
from fastapi import Depends, FastAPI, HTTPException
from starlette.requests import Request
from sqlalchemy.orm import Session

from . import models, schemas
from .database import SessionLocal, engine

# Create the database tables
models.Base.metadata.create_all(bind=engine)

app = FastAPI()

# Dependency
def get_db():
try:
db = SessionLocal()
yield db
finally:
db.close()


@app.get("/users/", response_model=List[schemas.User])
def read_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
users = db.query(models.User).offset(skip).limit(limit).all()
return users


if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=8000)

This app will support one API call which will return all the users that we have in our system.

Populating our database

The last thing we need to do is to populate our database with some data so that we can simulate the issue and how to fix it.

We will add additional code to our app.py file so that every time we start the server we’ll purge our User and Item tables and populate them again with demo data so that we’ll always start with the same environment and database records.

Add the following code to app.py and don’t forget to add the @app.on_event(“startup”) decorator to make sure that this code will run when the server starts:

@app.on_event("startup")
async def reset_db():
try:
db = SessionLocal()
db.query(models.Item).delete()
db.query(models.User).delete()

for i in range(50):
user = models.User(email=f"user{i}@email.com", hashed_password=f"pwdforuser{i}")
db.add(user)
db.commit()

users = db.query(models.User).all()
for user in users:
for i in range(20):
user_item = models.Item(title=f"Item{i}", description=f"Item{i} description", owner=user)
db.add(user_item)
db.commit()
finally:
db.close()

After this, our app.py file should look like this:

from typing import List

import sqltap
import uvicorn
from fastapi import Depends, FastAPI, HTTPException
from starlette.requests import Request
from sqlalchemy.orm import Session

from . import models, schemas
from .database import SessionLocal, engine

# Create the database tables
models.Base.metadata.create_all(bind=engine)

app = FastAPI()

# Dependency
def get_db():
try:
db = SessionLocal()
yield db
finally:
db.close()

@app.on_event("startup")
async def reset_db():
try:
db = SessionLocal()
print("deleting all items and users")
db.query(models.Item).delete()
db.query(models.User).delete()

print("Populating users")
for i in range(50):
user = models.User(email=f"user{i}@email.com", hashed_password=f"pwdforuser{i}")
db.add(user)
db.commit()

print("Populating items")
users = db.query(models.User).filter()
for user in users:
for i in range(20):
user_item = models.Item(title=f"Item{i}", description=f"Item{i} description", owner=user)
db.add(user_item)
db.commit()
finally:
db.close()


@app.get("/users/", response_model=List[schemas.User])
def read_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
users = db.query(models.User).offset(skip).limit(limit).all()
return users


if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=8000)

Running the server

The run_server.sh file will contain the command to run the server:

uvicorn fastapi_orm_example.app:app --reload

In order to run the server, follow these steps:

  1. Open a new terminal and navigate to your project’s root folder
  2. Activate your virtual environment and be sure to install dependencies
  3. Run the server — sh run_server.sh

You should expect to see the following output:

(fastapi_orm_example) $ sh run_server.sh
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [91071] using statreload
INFO: Started server process [91082]
INFO: Waiting for application startup.
INFO: Application startup complete.

From a different terminal, run the following curl command (you might need to install jq or remove it from the command):

curl -X GET "http://0.0.0.0:8000/users/?limit=100" -H "accept: application/json" | jq

You should get the following response:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload Upload Total Spent Left Speed
100 2682 100 2682 0 0 99k 0 --:--:-- --:--:-- --:--:-- 100k
[
{
"id": 1,
"email": "user0@email.com",
"is_active": true
},
{
"id": 2,
"email": "user1@email.com",
"is_active": true
},
{
"id": 3,
"email": "user2@email.com",
"is_active": true
},
{
"id": 4,
"email": "user3@email.com",
"is_active": true
},
...

Now, we have our server running and we were able to run a query that returned all of the users in our database. Phew, that was a lot just for one simple demo app…

In the next part we’ll inspect the queries using sqltap, simulate an N+1 issue and fix it.

Inspecting our queries

Now, that our app is ready, we can start inspecting the queries we’re making against our database.

What we need is a way to inspect and profile the SQL queries we are running in our API.

We’re interested in getting details about the queries such as the translation of the ORM queries to the sql statements, the number of queries that are running, how much time each query is taking and where each query originated from in the code.

There are many packages and tools that can help us with this. For example, if we were writing our application with Django Web Framework we could have used the Django Debug Toolbar. Similarly, if we were writing our application with Flask we could have used the Flask Debug Toolbar.

However, as FastAPI is a new framework in the neighborhood, we will use a more generic package for profiling our queries.

Entering SQLTap

According to the SQLTap package documentation, the tool will help us to understand:

  • how many times a SQL query is executed
  • how much time a SQL query takes
  • where your application is issuing SQL queries from

That’s great, because this is exactly what we’ve been looking for!

Creating FastAPI middleware

In order to use SQLTap, we will introduce it as middleware in our application so it will run for every http request that will pass through our API.

Add the following code to our main app.py file:

import sqltap

@app.middleware("http")
async def add_sql_tap(request: Request, call_next):
profiler = sqltap.start()
response = await call_next(request)
statistics = profiler.collect()
sqltap.report(statistics, "report.txt", report_format="text")
return response

The code above does the following:

  1. Imports the sqltap package
  2. Defines the add_sql_tap function with the @app.middleware(“http”) decorator, which will wrap our function in a middleware and also ensure that it will only be activated by an HTTP request. You would want to make sure that this middleware will only run in dev or test environments as the profiler can and will affect the general performance of your application.
  3. In the function, we first initialize our sqltap profiler
  4. Processes the original http request
  5. Collects the statistics from the profiler and appends them to our file report.txt. For this example, the filename is hardcoded, however, this value should be set in the config file of the application.
  6. Returns the response that we got from the server back to the client

After adding this, our app.py file should look like this:

from typing import List

import sqltap
import uvicorn
from fastapi import Depends, FastAPI
from starlette.requests import Request
from sqlalchemy.orm import Session
from sqlalchemy.orm import joinedload

from . import models, schemas
from .database import SessionLocal, engine

# Create the database tables
models.Base.metadata.create_all(bind=engine)

app = FastAPI()


# Dependency
def get_db():
try:
db = SessionLocal()
yield db
finally:
db.close()


@app.on_event("startup")
async def reset_db():
try:
db = SessionLocal()
# deleting all items and users
db.query(models.Item).delete()
db.query(models.User).delete()

# Populate users table
for i in range(50):
user = models.User(email=f"user{i}@email.com", hashed_password=f"pwdforuser{i}")
db.add(user)
db.commit()

# Populate items table
users = db.query(models.User).filter()
for user in users:
for i in range(20):
user_item = models.Item(title=f"Item{i}", description=f"Item{i} description", owner=user)
db.add(user_item)
db.commit()
finally:
db.close()


@app.middleware("http")
async def add_sql_tap(request: Request, call_next):
profiler = sqltap.start()
response = await call_next(request)
statistics = profiler.collect()
sqltap.report(statistics, "report.txt", report_format="text")
return response


@app.get("/users/", response_model=List[schemas.User])
def read_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
users = db.query(models.User).offset(skip).limit(limit).all()
return users


if __name__ == '__main__':
uvicorn.run(app, host='0.0.0.0', port=8000)

That’s it! We’ve set up our new middleware and we’re ready to collect our stats and analyze the report.

Running a query and analyzing the report

If your server is still running, stop it and run it again by running the following command from the project’s root path:

sh run_server.sh

Then, from a different terminal run the same curl command that we used previously:

curl -X GET "http://0.0.0.0:8000/users/?limit=100" -H "accept: application/json" | jq

We should get the same results that we got last time.

There will be a new file under the project’s root path called report.txt which will contain the analysis that was collected by sqltap :

========================================================================
====== SQLTap Profiling Report ======
========================================================================
Report Generated Time: 2020-09-06 20:31:16

========================================================================
====== Summary ======
========================================================================
Total queries: 1
Total time: 0.00 second(s)
Total profiling time: 0.00 second(s)

========================================================================
====== Details ======
========================================================================

============ QueryGroup 0 ============
------------ QueryGroup 0 summary ------------
Query count: 1
Query max time: 0.001 second(s)
Query min time: 0.001 second(s)
Query mean time: 0.001 second(s)
Query median time: 0.001 second(s)

------------ QueryGroup 0 SQL pattern ------------
SELECT users.id AS users_id,
users.email AS users_email,
users.hashed_password AS users_hashed_password,
users.is_active AS users_is_active
FROM users
LIMIT ?
OFFSET ?

------------ QueryGroup 0 breakdown ------------
Query 0:
Query duration: 0.001 second(s)
Query params:
param_1: 100
param_2: 0
Query rowcount: -1

------------ QueryGroup 0 stacks ------------
Total unique stack(s): 1

Stack 0:
1 call(s) from read_users @./fastapi_orm_example/app.py:64
Traceback:
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "~/.pyenv/versions/3.8.2/lib/python3.8/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "~/.pyenv/versions/3.8.2/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "./fastapi_orm_example/app.py", line 64, in read_users
users = db.query(models.User).offset(skip).limit(limit).all()
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3346, in all
return list(self)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3508, in __iter__
return self._execute_and_instances(context)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3533, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
self.dispatch.after_execute(self, elem, multiparams, params, ret)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)

The report will start with general information and stats about how many queries we performed overall and how much time it took:

========================================================================
====== SQLTap Profiling Report ======
========================================================================
Report Generated Time: 2020-09-06 20:31:16

========================================================================
====== Summary ======
========================================================================
Total queries: 1
Total time: 0.00 second(s)
Total profiling time: 0.00 second(s)

In our case, as expected, we performed only one query to our Users table.

The next part of the report will show the exact SQL query that was running behind the scenes:

========================================================================
====== Details ======
========================================================================

============ QueryGroup 0 ============
------------ QueryGroup 0 summary ------------
Query count: 1
Query max time: 0.001 second(s)
Query min time: 0.001 second(s)
Query mean time: 0.001 second(s)
Query median time: 0.001 second(s)

------------ QueryGroup 0 SQL pattern ------------
SELECT users.id AS users_id,
users.email AS users_email,
users.hashed_password AS users_hashed_password,
users.is_active AS users_is_active
FROM users
LIMIT ?
OFFSET ?

As expected, the server ran a SQL query to our Users table.

The next part will show the stack of where this query originated from:

------------              QueryGroup 0 stacks               ------------
Total unique stack(s): 1

Stack 0:
1 call(s) from read_users @./fastapi_orm_example/app.py:64
Traceback:
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "~/.pyenv/versions/3.8.2/lib/python3.8/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "~/.pyenv/versions/3.8.2/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "./fastapi_orm_example/app.py", line 64, in read_users
users = db.query(models.User).offset(skip).limit(limit).all()
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3346, in all
return list(self)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3508, in __iter__
return self._execute_and_instances(context)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3533, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
self.dispatch.after_execute(self, elem, multiparams, params, ret)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)

The SQL query originated from our code in the app.py file, specifically in this line:

File "./fastapi_orm_example/app.py", line 64, in read_users users = db.query(models.User).offset(skip).limit(limit).all()

This is very useful!

We were able to see exactly how many queries we ran, how much time it took, the raw SQL and where we ran it in our code.

This is a very powerful tool to debug and understand the relationship between the application and our database.

So far so good, we are only running one SQL query so there’s nothing to fix.

Supporting a new feature

After a while, we decide that we want to add another detail to our API response. For each user, we would also want to return the items that belong to each user.

In order to support this feature, we only need to change one line of code in our schemas.py file.

In our User class, we want to add another property for the items:

items: List[Item] = []

After this change, the schemas.py file will look like this:

from typing import List

from pydantic import BaseModel


class Item(BaseModel):
id: int
title: str
owner_id: int
description: str = None

class Config:
orm_mode = True


class User(BaseModel):
id: int
email: str
is_active: bool
items: List[Item] = []

class Config:
orm_mode = True

After performing this change and saving the file, let’s restart our web server and run our curl command again:

curl -X GET "http://0.0.0.0:8000/users/?limit=100" -H "accept: application/json" | jq
  • Now that we’re using sqltap, the report.txt file will get appended with a new report every time we send a new query to the API. In order to reduce the amount of noise in the file, you can delete it between runs so you’ll always have the most recent report.

Let’s take another look at the summary of our query in the report.txt file:

========================================================================
====== SQLTap Profiling Report ======
========================================================================
Report Generated Time: 2020-09-06 22:46:59

========================================================================
====== Summary ======
========================================================================
Total queries: 51
Total time: 0.01 second(s)
Total profiling time: 0.06 second(s)

Uh-oh, what’s going on here? We added one line of code, didn’t change any queries but somehow ended up adding 50 additional queries to our API call.

  • While we did add 50 additional queries, the overall time of the request didn’t take long, mainly because we don’t have a lot of data, our data model is very simple and our environment is running on our local machine. However, in a real-world system with more complexity and data, these issues can cause latency that will affect the overall performance of the system and cause unnecessary load on the database.

If we look in the report again, the interesting part about the number of queries, 51, is that we have 50 users in our database. This means that after our original query to the users table we made an additional query for each user. In this case, we had one query to fetch the user and then 50 additional queries where we queried the items for each of our users. Hence, the N+1 problem.

The report will then go into the details around the different queries that were used through the process and for each query, we will have details about query times, min, max, mean, median and number of times this exact query was used.

The report is built in a way that will show the queries in the order of the query count and this way we can handle our biggest problem first.

In our report, we will see the following:

========================================================================
====== Details ======
========================================================================

============ QueryGroup 0 ============
------------ QueryGroup 0 summary ------------
Query count: 50
Query max time: 0.001 second(s)
Query min time: 0.000 second(s)
Query mean time: 0.000 second(s)
Query median time: 0.000 second(s)

------------ QueryGroup 0 SQL pattern ------------
SELECT items.id AS items_id,
items.title AS items_title,
items.description AS items_description,
items.owner_id AS items_owner_id
FROM items
WHERE ? = items.owner_id

We can see the exact SQL query that was running behind the scenes and the number of times that it ran, which in our case is 50.

We can also see that what we actually queried was the items table and for each one we queried it with a different items.owner_id.

Next, in the report, there’s a breakdown of the different values that were used in this query, which in this case are the user ids that we have in our database:

------------             QueryGroup 0 breakdown             ------------
Query 0:
Query duration: 0.000 second(s)
Query params:
param_1: 50
Query rowcount: -1
Query 1:
Query duration: 0.000 second(s)
Query params:
param_1: 49
Query rowcount: -1
Query 2:
Query duration: 0.000 second(s)
Query params:
param_1: 48
Query rowcount: -1
Query 3:
Query duration: 0.000 second(s)
Query params:
param_1: 47
Query rowcount: -1
...

And after that, we have the QueryGroup stack which will help us to identify where these queries originated from in the code:

------------              QueryGroup 0 stacks               ------------
Total unique stack(s): 1

Stack 0:
50 call(s) from <lambda> @<string>:1
Traceback:
File "~/.vscode/extensions/ms-python.python-2020.1.58038/pythonFiles/ptvsd_launcher.py", line 43, in <module>
main(ptvsdArgs)
File "~/.vscode/extensions/ms-python.python-2020.1.58038/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main
run()
File "~/.vscode/extensions/ms-python.python-2020.1.58038/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 342, in run_module
run_module_as_main(target, alter_argv=True)
File "~/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 192, in _run_module_as_main
return _run_code(code, main_globals, None,
File "~/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "~/work/test_fastapi/sql_app/app.py", line 67, in <module>
uvicorn.run(app, host='0.0.0.0', port=8000)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/uvicorn/main.py", line 346, in run
server.run()
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/uvicorn/main.py", line 374, in run
loop.run_until_complete(self.serve(sockets=sockets))
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/starlette/middleware/base.py", line 38, in coro
await self.app(scope, receive, send)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/starlette/exceptions.py", line 62, in __call__
await self.app(scope, receive, sender)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/starlette/routing.py", line 590, in __call__
await route(scope, receive, send)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/starlette/routing.py", line 208, in __call__
await self.app(scope, receive, send)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/starlette/routing.py", line 41, in app
response = await func(request)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/fastapi/routing.py", line 134, in app
response_data = serialize_response(
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/fastapi/routing.py", line 66, in serialize_response
value, errors_ = field.validate(response, {}, loc=("response",))
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/fields.py", line 494, in validate
v, errors = self._validate_sequence_like(v, values, loc, cls)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/fields.py", line 525, in _validate_sequence_like
r, ee = self._validate_singleton(v_, values, v_loc, cls)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/fields.py", line 616, in _validate_singleton
value, error = field.validate(v, values, loc=loc, cls=cls)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/fields.py", line 487, in validate
v, errors = self._validate_singleton(v, values, loc, cls)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/fields.py", line 623, in _validate_singleton
return self._apply_validators(v, values, loc, cls, self.validators)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/fields.py", line 630, in _apply_validators
v = validator(cls, v, values, self, self.model_config)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/class_validators.py", line 306, in <lambda>
return lambda cls, v, values, field, config: validator(v)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/main.py", line 539, in validate
return cls.from_orm(value)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/main.py", line 444, in from_orm
values, fields_set, validation_error = validate_model(cls, obj)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/main.py", line 815, in validate_model
value = input_data.get(field.alias, _missing)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/pydantic/utils.py", line 217, in get
return getattr(self._obj, key, default)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/orm/attributes.py", line 282, in __get__
return self.impl.get(instance_state(instance), dict_)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/orm/attributes.py", line 710, in get
value = self.callable_(state, passive)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/orm/strategies.py", line 746, in _load_for_state
return self._emit_lazyload(
File "<string>", line 1, in <lambda>
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/orm/strategies.py", line 887, in _emit_lazyload
q(session)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/ext/baked.py", line 541, in all
return list(self)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/ext/baked.py", line 444, in __iter__
return q._execute_and_instances(context)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3392, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 982, in execute
return meth(self, multiparams, params)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1104, in _execute_clauseelement
self.dispatch.after_execute(self, elem, multiparams, params, ret)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)

As we saw earlier, the stack is very useful in allowing us to find the exact location in the code that initiated all of these additional queries and fix the problem, similar to how we would do with an exception.

In this case, however, it’s a bit more difficult to identify the origin of the issue. Because it started when we added the items property to our User schema, it seems like the conversion to JSON that is done on the Pydantic level is what caused the issue.

The next query group in our report is the original query that we’ve seen before where we query our Users table:

============                  QueryGroup 1                  ============
------------ QueryGroup 1 summary ------------
Query count: 1
Query max time: 0.001 second(s)
Query min time: 0.001 second(s)
Query mean time: 0.001 second(s)
Query median time: 0.001 second(s)

------------ QueryGroup 1 SQL pattern ------------
SELECT users.id AS users_id,
users.email AS users_email,
users.hashed_password AS users_hashed_password,
users.is_active AS users_is_active
FROM users
LIMIT ?
OFFSET ?

Following the stack will bring us back to the following line in our app.py file :

users = db.query(models.User).offset(skip).limit(limit).all()

Fixing the issue

Now that we’ve analyzed our report, we know that there’s an N+1 problem in our query.

We want to be able to fix the issue that is causing 50 additional transactions to the database and ensure that we’re actually performing the least amount of queries, which in our case should only be one.

Looking back at our code, we’re making a query to our User model and then converting it to our schema User model with Pydantic:

@app.get("/users/", response_model=List[schemas.User])
def read_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
users = db.query(models.User).offset(skip).limit(limit).all()
return users

Then, when we look back at the trace we can see another hint around the strategy of the query which is lazy_load:

File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/orm/attributes.py", line 710, in get
value = self.callable_(state, passive)
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/orm/strategies.py", line 746, in _load_for_state
return self._emit_lazyload(
File "<string>", line 1, in <lambda>
File "~/.pyenv/versions/test_fastapi/lib/python3.8/site-packages/sqlalchemy/orm/strategies.py", line 887, in _emit_lazyload
q(session)

It seems that our query is using lazy loading which means that we’re issuing a SQL call each time we try to load a related object, which in our case is loading the list of items for each user.

We can change the default behavior of our queries to use a different strategy instead of lazy loading.

Going over the list of relationship loading techniques in the SQLAlchemy documentation, a possible solution is to use the joinedload loading technique. Using joinedload, we will fetch our users and items in one query.

In order to make this change in our code, we will first import the additional function in our app.py file:

from sqlalchemy.orm import joinedload

And in our read_users function, we will change our query so our function looks like this:

@app.get("/users/", response_model=List[schemas.User])
def read_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
users = db.query(models.User).options(joinedload(models.User.items)).offset(skip).limit(limit).all()
return users

If we run our server again and run the same request we should get the same result as before:

curl -X GET "http://0.0.0.0:8000/users/?limit=100" -H "accept: application/json" | jq

Looking at the report, we can see that we’ve only made a single query and that our original query became a little more complex in order to ensure that we’re also fetching the items for each user:

========================================================================
====== SQLTap Profiling Report ======
========================================================================
Report Generated Time: 2020-09-06 23:02:31

========================================================================
====== Summary ======
========================================================================
Total queries: 1
Total time: 0.00 second(s)
Total profiling time: 0.00 second(s)

========================================================================
====== Details ======
========================================================================

============ QueryGroup 0 ============
------------ QueryGroup 0 summary ------------
Query count: 1
Query max time: 0.001 second(s)
Query min time: 0.001 second(s)
Query mean time: 0.001 second(s)
Query median time: 0.001 second(s)

------------ QueryGroup 0 SQL pattern ------------
SELECT anon_1.users_id AS anon_1_users_id,
anon_1.users_email AS anon_1_users_email,
anon_1.users_hashed_password AS anon_1_users_hashed_password,
anon_1.users_is_active AS anon_1_users_is_active,
items_1.id AS items_1_id,
items_1.title AS items_1_title,
items_1.description AS items_1_description,
items_1.owner_id AS items_1_owner_id
FROM
(SELECT users.id AS users_id,
users.email AS users_email,
users.hashed_password AS users_hashed_password,
users.is_active AS users_is_active
FROM users
LIMIT ?
OFFSET ?) AS anon_1
LEFT OUTER JOIN items AS items_1 ON anon_1.users_id = items_1.owner_id

------------ QueryGroup 0 breakdown ------------
Query 0:
Query duration: 0.001 second(s)
Query params:
param_1: 100
param_2: 0
Query rowcount: -1

------------ QueryGroup 0 stacks ------------
Total unique stack(s): 1

Stack 0:
1 call(s) from read_users @./fastapi_orm_example/app.py:65
Traceback:
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "~/.pyenv/versions/3.8.2/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "~/.pyenv/versions/3.8.2/lib/python3.8/concurrent/futures/thread.py", line 80, in _worker
work_item.run()
File "~/.pyenv/versions/3.8.2/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "./fastapi_orm_example/app.py", line 65, in read_users
users = db.query(models.User).options(joinedload(models.User.items)).offset(skip).limit(limit).all() # The fix
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3346, in all
return list(self)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3508, in __iter__
return self._execute_and_instances(context)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3533, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
self.dispatch.after_execute(self, elem, multiparams, params, ret)
File "~/.pyenv/versions/3.8.2/envs/fastapi_orm_example/lib/python3.8/site-packages/sqlalchemy/event/attr.py", line 322, in __call__
fn(*args, **kw)

We were able to reduce the amount of queries that we were making to the database and resolve the N+1 issue in our code! 🎉

Summary

In this post we’ve covered how to use the SQLTap package to profile and analyze the SQL queries that our code is running.

As a best practice, it’s recommended to use the profiler during local development to check the number of transactions that are being sent to the database behind the scenes.

Please reach out if there are any questions or comments.

The code in this blog post can be found in GitHub.

References

--

--

Isaac Vidas
IndigoAg.Digital

Software development, python, data, scraping, dinosaurs and pirates...