Enhance Your Routine DB Activities With Django’s ORM

Susmitha Gudapati
Proximity Works
Published in
8 min readJul 28, 2020

Database transactions have never been trivial in our groove of building efficient web applications. They play a major role — dictating performance almost single-handedly in data-intensive applications. There are, however, some nearly hidden approaches that aren’t part of our routine development yet. These could enhance our interactions with the database and, inwardly, give us the chance to unleash all the power of Django’s ORM. I had a chance to play around with these simple yet interesting methods recently and made some interesting discoveries. Let’s dive in!

This is the sample model schema I will be referring to in this blog —

Profiling Query Operations First

How do you profile querysets in the application? Firstly, there are a couple of tools available which provide insights on how long it takes for each queryset to execute and whether the query triggered is efficient enough or not. django-debug-toolbar is one such tool that is profusely used for this purpose. django-silk and django-live-profiling are two other tools that are great for profiling.

The other way to check on querysets and their execution time is through the connection module.

connection.queries returns a list of dictionaries with raw SQL queries running in Django along with their execution time. This helps in understanding what precisely the queries are doing, and what they are costing.

There is also a method called QuerySet.explain() which is applied to an individual queryset to understand its metrics. The detailed explanation composed by this method of how a database executes a query gives you a rough idea of the efficacy of the queryset. Using the explain functionality to understand if querysets have to be optimized or if indexes are to be added is often a good technique to facilitate performance.

Understanding QuerySets

The interesting trait of querysets in Django is that operations like creating, filtering, and slicing don’t require any database activity. Moreover, despite the number of operations chained to the queryset, the query isn’t executed until it’s evaluated. Because of this tendency, querysets in Django are known as ‘lazy querysets’.

The results of the queryset are only fetched when they are asked for in Django. For example, there are three queries in the above code. Though it seems like this particular block hits the database thrice, it actually executes the query and connects with the database only once — at the print(products) statement.

Caching in QuerySets

Django caches the whole queryset and all the results of non-callable attributes of the ORM once the queryset is executed. When a queryset is instantiated, the cache is empty. The first time the queryset is evaluated, the results are fetched from the database and saved in the queryset’s cache. Now, further evaluations of this queryset are served from the cache.

So, every queryset contains a cache to minimise the number of database connections. Being aware of the caching mechanisms while building logic and carefully triggering attributes from templates improves performance by avoiding numerous database connections

Although this seems pretty straightforward, querysets can be misleading if they aren’t defined correctly. Understanding when querysets populate the cache is really important to enhance the performance of an application. Results of the queryset are only cached when the entire queryset is loaded and evaluated. This means that evaluating only a part of the queryset — like in indexing or slicing — doesn’t populate the cache. Here are a few examples to help you better understand caching in querysets —

Indexing

This is a pretty standard technique to improve the speed of retrieval operations from the database. It is achieved either through Meta.indexes or Field.db_index in a model. Adding indexes to fields that are routinely queried upon in operations like filter(), exclude(), order_by(), etc. improves the performance of the application as indexes contribute to speed up lookups.

Indexing is only supposed to be added to frequently used fields because the overhead of maintaining an index may outweigh any gains in query speed.

select_related

select_related enhances performance when accessing related-object data like foreign-key or one-to-one fields on the result instances of the queryset. Adding select_related to a queryset makes it a complex query to execute because it returns a queryset and fetches all fields of the related-object. But this means that you can access foreign-key relationships after this without hitting the database.

This might seem like restricting just one hit to the database — but imagine when we try and access related-object data in loops!

To improve performance, chained foreign keys can also be referred to in select_related similar to the way join queries are queried. Here’s an example —

Lastly, here are a few things to note while using select_related

  1. The order of select_related and the filter doesn’t matter. Both the below queries are analogous to each other.
Product.objects.filter(is_active=True).select_related('category')Product.objects.select_related('category').filter(is_active=True)

2. It can be applied to reverse relationships of one-to-one fields, but the related_name should be given in arguments instead of a name.

3. To clear the related field values that were stored from past select_related calls on a queryset, pass None as an argument.

new_queryset = queryset.select_related(None)

select_related is magic, right? But then why is this restricted to single-valued relationships — foreign-key and one-to-one?

That’s because of its execution model. select_related creates a SQL join and includes the fields of the related object in the SELECT statement. So, this can only fetch related objects in the same database query. However, to avoid the much larger queryset that would result from joining across a many-to-many or many-to-one relationships, the select_prefetch method is limited to single-valued relationships.

prefetch_related

This is very similar to select_related in terms of clearing the inundation of database queries and hits while accessing related-objects but varies in the method of execution.

prefetch_related performs a new lookup for each relationship. The joining across relationships is done in Python here instead of in SQL. This is why it can prefetch many-to-many and many-to-one objects in addition to the relationships supported by select_related. Below is an example to refer to queries using prefetch_related

In the above sample code, when __str__ is called for each product, it loads data from the QuerySet’s cache instead of hitting and querying the database. That’s because attributes are prefetched when a query is run for products. This means that the attributes of all products are fetched using one single query!

Things to note while using prefetch_related

  1. To prefetch a related attribute of an iterable instance, the prefetch_related_objects() function should be used.
  2. Calling an iterator() on a prefetch_related() query will be ignored because these two optimizations don’t make sense together.
  3. Any subsequent query operations chained to a prefetch_related function implies a different database query which ignores previously cached results and fetches data creating a new database query.

SQL is faster than Python

F() expressions:

The F() object represents the value of a model field or annotated column. It performs database operations on model field values without having to pull them out of the database into Python memory. Eliminating this one step from the queryset evaluation process increases its performance drastically.

When Django encounters F(), it overrides the standard Python operators and generates an SQL expression that describes the required operations at the database level. For example, in the above code, Python creates an SQL query to increment the stock by 5 units.

F() can also be used on querysets to perform updations. It combines the get() and save() operations of the above example by reducing two queries to one. Check out this one —

Therefore, F() has the ability to boost performance by getting the database to perform operations rather than Python. It can also reduce the number of queries in some cases.

Using annotations to perform aggregations in the database.

But what are annotation and aggregation methods?

Annotation is used to create dynamic fields on models by performing various query expressions. Each argument to this method is an annotation that will be added to each object in the queryset that is returned.

Aggregation is used to retrieve values that are derived by summarizing or aggregating a collection of objects.

Aggregation presents a super-efficient way to work with arithmetic operations on the model fields. Since it’s an SQL method, everything is performed in the database itself. While aggregation works wonders alone, it can be more efficient when combined with the annotation method.

Other small but significant optimizations

Retrieving a single object using unique/indexed fields

When an individual object is being retrieved through the get() method, using unique or indexed columns to make queries enhances performance because these columns are indexed in the database. It also makes sure that multiple objects don’t match the query and hence speeds up the query execution.

Limiting fetch to exactly what is required

values() & values_list(): The queryset’s filter like get(), filter(), exclude(), etc. returns an entire ORM model object in results. When the whole object isn’t required, applying the values() or values_list() method on these results returns a dict or list of required values instead of loading a complete model object. These functions can also be used for replacing model objects in the template code as long as they suffice the use case. Check the block below to see how values_list() elevates performance.

defer() & only(): defer() can be used to eliminate columns that aren’t required to avoid loading them. This is most effective when there are columns with heavy text data or fields that might take a lot of processing time to convert to Python. Please note that calling model fields that were deferred earlier should load data from a disk for every single row in the result queryset. So, profiling columns before deferring is super important.

only() is quite the opposite of defer(). only() works by replacing the default set of fields to load immediately. The best application is where almost all the fields need to be deferred. Also, using only() to specify the complementary set of fields can result in much simpler code.

Size and Existence

To get the size of the queryset, count() is much faster than len() as len() is a Python method. len() loads the entire queryset into memory to execute and return results which makes it slower than count().

To check if the queryset exists, the exists() method is faster than an if condition.

In general, SQL expressions are faster than Python operators or methods.

To sum it up —

Django has an excellent ORM layer and inculcating these methods into queries can spike your application’s performance. These initially may not seem super effective in smaller applications. But as your application grows, with more and more queries hitting the database, you’ll start to see a huge difference in performance.

This article was originally published on the Proximity Blog, where we talk about UX, Performance & Scalability Engineering.

Sign up for more if you liked what you just read, or follow us on Twitter & Instagram!

--

--

Susmitha Gudapati
Proximity Works

Computer Science grad working as Software Engineer, voracious reader, travel enthusiast and a cuber.