Understanding select_related and prefetch_related Methods in Django ORM.
Django web framework has a powerful object-relational mapping (ORM). ORM is a programming technique used in software development to bridge the gap between the object-oriented programming paradigm and relational database. The select_related and prefetch_related methods are used to boost the performance of your database query. To find out why these two methods are performance boosters we have to find out how Django’s ORM queries related objects in the absence of the two methods.
Consider the following piece of code. It is a block of code for a model that we will use in this article to demonstrate the use case of select_related and prefetch_related methods.
class Plan(models.Model):
"""user plan subscriptions"""
class Packages(models.TextChoices):
NOVICE = "novice", "Novice"
ELITE = "elite", "Elite" # lowercase, check celery task
package = models.CharField(verbose_name="plan", max_length=10, choices=Packages.choices,
default=None) # Novice, Elite, Legendary
level = models.PositiveIntegerField(default=1)
link = models.URLField(verbose_name="link", unique=True, editable=False, null=True)
commission = models.DecimalField(decimal_places=2, max_digits=10, editable=False, default=0.00)
referals = models.ManyToManyField("Account", related_name="invites")
is_paid = models.BooleanField(verbose_name="account validity", default=True, blank=True)
def __str__(self):
return self.package
class Account(models.Model):
"""Registered user accounts"""
user = models.OneToOneField(User, on_delete=models.CASCADE, related_name="account")
plan = models.ManyToManyField(Plan, related_name="subscription", blank=True)
def __str__(self):
return f"{self.user.username}'s account"
class Blog(models.Model):
"""Blog model"""
account = models.ForeignKey(Account, on_delete=models.SET_NULL, null=True, blank=True)
title = models.CharField(max_length=200)
slug = models.SlugField()
content = models.TextField()
created_at = models.DateTimeField(auto_now=True)
def __str__(self):
return self.title
Django keeps a cache of each query result in the QuerySet’s cache the first time it is evaluated and in subsequent evaluations of the QuerySet it will reuse the cached results however, it is also important to note that not all evaluated queries are cached. From the code snippet above let's consider the following query
# hits the database
blog = Blog.objects.get(id=1)
# hits the database to retreive the related account
print(blog.account)
# hits the database to get all Blog objects
queryset = Blog.objects.all()
# iterating the queryset
for qs in queryset:
print(qs.account) # hits the database to retreive all related accounts
When iterating through a Blog QuerySet, Django performs two queries to retrieve related objects. It is possible to reduce the number of database hits by using select_related and prefetch_related methods.
- Select_related(*fields)
This method returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster that results in a single more complex query but later use of foreign-key or one-to-one relationships won’t require additional database hits. We can optimize the above queries and reduce the number of database hits by introducing the select_related method. It takes a string of the related field attributes. In our case, the relationship is established at the Blog’s account model attribute which is a foreign-key relationship.
# example 1
blogs = Blogs.objects.select_related("account")
for blog in blogs:
print(blog.account)
# example 2
blog = Blogs.objects.select_related("account__user").get(id=1)
print(blog.account)
print(blog.user)
# example 3
account = Account.objects.select_related("user").get(id=1)
print(account.user) # no database hit
From the above snippet of code, we can take a look at several use cases of select_related applied to make efficient queries.
In example 1, the select_related method is used to fetch all related account objects in a single query note that the account argument in select_related in the field with the Foreign-Key relation therefore if there’s a need to access the related objects from blog instances, Django ORM will not hit the database to query related objects because all related objects were queried initially and stored in the QuerySet’s cache. The method select_related can take the model attribute that has the foreign key defined or the related_name if defined in the foreign key field.
In example 2, It is possible to query objects that have foreign-key or one-to-one relationships and span multiple models. The query will cache the related Account and User object meaning if the objects are accessed they will be retrieved from the cache and not from the database. It does this by performing a SQL join and fetching the related objects in a single query, rather than issuing separate queries for each related object. Select_related is applied to mitigate a problem referred to as N + 1.
The N + 1 essentially means you have one main query to retrieve objects and N number of queries to retrieve related objects. As much as the select_related method is a query optimizer you should put in mind that overuse of it or performing complex queries of complex relationships may lead to large and more complex queries and fail to optimize your query. It is recommended for more complex relationships to use the prefetch_related method.
2. Prefetch_related(*lookup)
# example 1
accounts = Account.objects.prefetch_related("plan")
for account in accounts:
print(account.user)
for plan in account.plan.all():
print(plan.package)
# example 2
accounts = Account.objects.prefetch_related("blog_set")
for account in accounts:
print(account.user)
for blog in account.blog_set.all():
print(blog.title)
The prefetch_related method returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups. This has a similar purpose to select_related, in that, both are designed to stop the deluge of database queries that are caused by accessing related objects, but the strategy is quite different.
select_related works by creating an SQL join and including the fields of the related object in the SELECT
statement. For this reason, select_related
gets the related objects in the same database query.
In example 1, in the snippet of code above, we are prefetching related objects in a Many-To-Many field. The account model has a plan attribute that defines Many-To-Many relationships with the Plan model. We are looping over all account instances while printing related user objects. In the inner loop, we are printing the plan package name while iterating through all plan instances of each individual account of the outer loop. Account instances will only be queried once without triggering additional queries during related object iteration.
In example 2, prefetch_related is used in the reverse relationship of account and blog. prefetch_related method takes a reverse manager as the lookup. Prefetch_related is ideal for the reverse relationship because it will perform fewer queries.
How prefetch_related works
Prefetch_related fetches multiple instances of related objects through a combination of database queries and in-memory processing. The following is the process used by prefetch_related methods to optimize a query.
a. Initial query — it fetches the main objects using a single query
b. Additional query — It generates additional queries to fetch related for all the main objects in a single batch. This helps minimize the number of database round-trips.
c. In-memory processing — After retrieving the related objects from the database, Django performs in-memory processing to associate the related objects with the main objects. This can involve creating dictionaries or lists that map main objects to their related objects.
The difference between prefetch_related and select_related methods
select_related and prefetch_related may seem similar however, it is not, but both methods are used to optimize queries. The following are some of the distinguishing factors between the two.
a. Type of relationships
prefetch_related — Best suited for optimizing queries involving reverse relationships (e.g., ForeignKey, OneToOneField, ManyToManyField).
select_related — Optimizes queries involving forward relationships (i.e., ForeignKey fields on the current model).
b. Number of queries
prefetch_related — can still result in multiple queries, especially when dealing with many-to-many relationships or complex queries but typically it will result in fewer queries as compared to when the method is not used
select_related — Aims to reduce the number of queries to 1 or a small number, as it uses SQL joins to fetch related data in a single query.