Fine-Tuning Django ORM: Proven Optimization Techniques

Published in

Simform Engineering

6 min readApr 2, 2024

Elevate Your Django ORM Game: Strategies for Efficient Queries

Django ORM, short for Object Relational Mapping, is a great way for Python developers to interact with databases. By translating Python code into SQL queries, it eliminates the need to write complex SQL statements directly, making database interactions more intuitive and efficient.

In this blog, we’ll learn all about Django’s ORM, exploring advanced techniques and best practices to optimize performance and streamline database operations in our Django projects.

What exactly does ORM do behind the scenes
Why do we need optimization
Techniques for ORM optimization
Database indexing
Queryset caching
Use iterator
⁠Mitigate the N+1 query problem
⁠Retrieve required data only
⁠Aggregate and Annotate
Use raw SQL
Conclusion

What exactly does ORM do behind the scenes?

Behind the scenes, Django ORM (Object-Relational Mapping) performs many tasks to enable seamless interaction between Python code and a relational database:

Model definition
Database schema generation
Query generation
Query execution
Object-relational mapping
Lazy evaluation
Caching and optimization

Simply put, Django ORM converts ORM queries into SQL queries and then transforms the SQL data into Python objects.

Why do we need optimization?

Before undertaking optimization, we should know what the query is doing and how much it costs us. We can check that by using Queryset.explain(). We can also use some external libraries like django-debug-toolbar to monitor the queries.

There are many benefits of the ORM query optimization:

Performance: It improves performance by reducing data fetching, manipulation, and database calls.
Scalability: It ensures that data can handle increased traffic and data volume.
Resource Usage: Insufficient queries can use server resources like CPU, memory, and database connections, which optimization can reduce.
Database Load: It reduces the load on the database server.
Code Maintainability: It makes the codebase more maintainable and easier to work on for future development.

Techniques for ORM optimization

Technique 1: Database indexes

Indexes are essential for quick searches in a database, improving query execution by avoiding a full scan of every row.

Here’s how to implement indexes in Django:

class School(models.Model):
    Name = models.CharField(max_length=100)
    affiliation_date = models.DateTimeField(auto_now=True)

    class Meta:
        indexes = [
            models.Index(fields=['name', 'affiliation_date'], name='name_afiliation_index'),  # Composite index
            models.Index(fields=['name'], name='name_idx'),  # Single-field index with a custom name
        ]

But on the flip side,

Indexing can slow down insert, update, and delete operations because the database must update the index data structure.
Having too many indexes can also complicate the database schema and query optimization.

Remember, indexes are beneficial for tables where fetch queries are frequent. more.

Technique 2: Queryset caching

There are many tools and libraries for caching the query, like:

Django’s built-in caching framework: including in-memory caching (LocMemCache), file-based caching (FileBasedCache), and database caching (DatabaseCache)
Redis
Memcached
django-redis, etc.

How to use queryset caching

Define the location and caching type we are using in settings.py

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        'LOCATION': 'unique-snowflake',
    }
}

By using the cache() method, we can initialize the caching of the queryset.

from django.core.cache import cache
from .models import School

def get_cached_schools():
    schools = cache.get('cached_schools')
    if not books:
        schools = list(School.objects.all())
        cache.set('cache_schools', schools, timeout=3600)  # Cache for 1 hour (3600 seconds)
    return schools

Technique 3: Use iterator()

When there is a large amount of data object, queryset.iterator() will help to fetch the data in chunks.

Technique 4: Mitigate the N+1 query problem

Fetching data from the database multiple times for different data of the same set can significantly slow down the process. To avoid this, use prefetch_related()and select_related().

select_related works by creating an SQL join and including the fields of the related object in the select statement. This means that select_related retrieves the related objects in the same database query. To prevent the substantial increase in the result set that would occur from joining across a ‘many’ relationship, select_related is restricted to single-valued relationships — foreign keys and one-to-one relationships.

from .models import School

# Fetch a queryset of School objects with related Class objects selectrelated
schools = School.objects.select_related('class').all()

for school in schools:
    print(school.name)
    print(school.class.title)  # No additional query is executed for each school

prefetch_related, on the other hand, performs a separate lookup for each relationship and does the ‘joining’ in Python. This allows it to prefetch many-to-many, many-to-one, and GenericRelation objects which cannot be achieved using select_related. prefetch_related complements select_related, which supports foreign key and one-to-one relationships.

from .models import School

# Fetch a queryset of School objects with related class objects prefetched
authors = School.objects.prefetch_related('class').all()

for school in schools:
    print(school.name)
    for class in school.classes.all():
        print(class.title)  # No additional query is executed for each school

Technique 5: Retrieve required data only

When the data requirement is clear, and we can manage data in a dictionary and list instead of an object, it’s beneficial to use values() and values_list().

values() returns dictionary data instead of the model object.

from .models import School

# fetch only "name" and "contact" value of school's object.
school_data = School.object.all().values("name", "contact")
print(school_data)

reponse: 
[{"name": "abc", "contact":"758373"}, {"name": "xyz", "contact": "87674643"}]

values_list() returns a list of values instead of the model object.

from .models import School

# fetch only "name" and "contact" value of school's object.
school_data = School.object.all().values_list("name", flat=True)
print(school_data)

reponse: 
["abc", "xyz"]

Technique 6: Aggregate and annotate

In Django ORM, the annotate() and aggregate() methods are used to perform complex queries on the database. The annotate() method adds extra fields to each object in the query set, while the aggregate() method performs calculations on the entire query set.

For example, if we have a model called “Order” with a field called “total”, you can use the annotate() method to add a field called "discounted_total" that subtracts 10% from the total:

from django.db.models import F

# will fetch data multiple of total with 0.9 if exists otherwise None
orders = Order.objects.annotate(discounted_total=F('total') * 0.9)

This will add a field called “discounted_total” to each order object in the query set.

Similarly, if you want to calculate the total of all orders, you can use the aggregate() method:

from django.db.models import Sum

#will fetch data value of sum if exists otherwise None
total = Order.objects.aggregate(Sum('total'))

We can use additional functionalities in the annotate function, such as conditionals (using Case and When), subqueries, outer references (OuterRef), F expressions, and more.

Technique 7: Use raw SQL queries

In certain scenarios, it may be necessary to write SQL queries to fetch or manipulate data, and raw SQL queries are faster than ORM queries. There are two ways to use raw SQL queries:

By establishing a connection with SQL and writing a complete SQL query inside the execute function.

from django.db import connection

def my_custom_sql():
    with connection.cursor() as cursor:
        cursor.execute("SELECT * FROM school_table where id in (12, 24, 36)")
        row = cursor.fetchone()
    return row

2. By using raw queries in the ORM query

from .models import School

# Write your raw SQL query
query = "SELECT * FROM school_table WHERE id = %s"

# Execute the raw query and map the results to model instances
results = School.objects.raw(query, [1])

# Iterate over the results
for result in results:
    print(result)

There are many other points that can help optimize our ORM queries:

Use bulk_create() to create multiple model objects in a single query.

from .models import School

# List of objects to create
objs = [
    School(name='abc', contact="52345245"),
    School(name='xyz', contact="5345623"),
]

# Bulk create the objects
School.objects.bulk_create(objs)

bulk_update()helps to update multiple model objects in a single query.

from .models import School

# List of objects to update
objs = [
    School(id=1, name='JKL'),
    School(id=2, name='MNO'),
]

# Bulk update the objects
School.objects.bulk_update(objs, ['name'])

Use of exists() and count()

The exists() function helps to check if data exists based on a defined condition.

from .models import School

# query to check data existence in true or false
school_exists = School.objects.filter(name="abc").exists()

The count() function is used to get the total count of instances.

from .models import School

# query to get total count of data
total_school = School.objects.filter(name="abc").count()

Use of defer() and only()

The defer() function is used to defer the loading of certain fields until they are accessed explicitly.

from .models import School

# query to get total count of data
total_school = School.objects.defer("name", "contact")

On the other hand, the only() function is used to restrict the loading of fields to only those specified.

from .models import School

# query to get total count of data
total_school = School.objects.only("name")

Conclusion

Optimizing Django queries is essential to building scalable and performant web applications. To write efficient code, you should know how these optimization concepts work in Django.

In this blog, we covered:

Use select_related to minimize the number of queries when accessing related objects.
Use prefetch_related to retrieve related objects in separate queries.
Use values and values_list to retrieve specific fields from QuerySet.
Use annotate and aggregate to perform calculations and aggregation on QuerySet.
Use Subquery and OuterRef to perform subqueries and reference values from the outer query.
Use Case/When conditional expression to fetch results based on some conditions.

For further reading:

Django

The web framework for perfectionists with deadlines.

docs.djangoproject.com

For more updates on the latest tools and technologies, follow the Simform Engineering blog.
Follow us: Twitter | LinkedIn