Unlocking 5 Invaluable Django ORM Insights: Gain an Early Learning Advantage from My Experience

Rahul Beniwal
Django Unleashed
Published in
4 min readSep 15, 2023

Django’s standout feature as a backend framework for Python is undoubtedly its powerful and highly flexible built-in ORM (Object-Relational Mapping). This ORM capability is so versatile that if you have a solid understanding of SQL, you may rarely find the need to write raw SQL queries. Personally, I work with Django’s ORM daily, and I’ve uncovered some valuable insights that I’m eager to share with you.

Image Credit https://unsplash.com/photos/BI465ksrlWs

API performance is inversely proportional to the number of database queries.

Absolutely, you’re absolutely right. Network latency can significantly impact API performance, regardless of whether the latency is measured in milliseconds or seconds. When an API makes multiple database queries, especially over a network, each query introduces additional latency, and these latencies can add up quickly. Minimizing the number of queries and optimizing data retrieval and processing is crucial for maintaining good API performance and responsiveness.

Tips to reduce db queries:

  1. Utilize groupby: Instead of looping through results, make use of the groupby feature in your database queries. It helps in aggregating data directly within the database, reducing post-processing.
  2. Leverage Joins: Opt for related lookups and joins between tables to fetch data efficiently. This minimizes the need for separate queries when dealing with related information.
  3. Harness prefetch_related and select_related: Improve related data retrieval by employing these Django ORM tools. prefetch_related is particularly useful for handling complex relationships.

Two small queries are better than a large bloted query.

At its core, ORM generates raw queries through database adapters. When dealing with complex operations like multiple joins and group by clauses, breaking a large query into two or more smaller ones can enhance database performance. Combining the results afterward can lead to more efficient execution.

How to combine results of two queries :

  1. One effective approach I often use is to maintain a common numeric ID between two queries. I create two dictionaries, each representing the results of these queries, with the numeric ID as the key in both dictionaries. This simplifies the process of merging and working with the data, making it easier for me to correlate and analyze information from both queries.

Problems that you can encounter:

  1. If you’re implementing sorting, searching, and pagination, it’s important to handle these tasks efficiently. They work best when executed as a single database query. However, if you combine query results and then apply sorting and filtering, it can slow things down, especially with large result sets.

Distinct donot work with Window Functions

In the realm of SQL and database operations, you might be familiar with the DISTINCT keyword, which is used to eliminate duplicate rows from a result set. It's a handy tool when you want to retrieve unique values from a column in your table.

However, when it comes to using Window Functions in SQL, you might encounter a scenario where the DISTINCT keyword won't yield the expected results. Why is that?

Window Functions, such as ROW_NUMBER(), RANK(), and DENSE_RANK(), operate differently from traditional aggregate functions like SUM or AVG. These functions are designed to perform calculations across a set of table rows related to the current row, often referred to as a "window" or "frame."

it’s important to understand how the window frame is defined, including the PARTITION BY and ORDER BY clauses, which determine which rows are included and how they are ordered within the frame. Instead of using DISTINCT, focus on correctly defining your window frame to achieve the precise calculations and filtering you need.

How to handle the above limitation?

  1. Use Subquery to fetch results because in subquery you can use distinct.

Using Subqueries with groupby for large volumes can be issue

It was only when I tackled a sizable dataset, where GROUP BY was applied to a field retrieved using CASE and WHEN, that I encountered an eye-opening revelation. The initial query started to balloon in complexity, posing significant challenges for management and future alterations. I found myself wrestling with complexities involving OuterRef, Window, and the frustration of being unable to use DISTINCT in conjunction with Window Functions.

How to tackle this?

  1. In a previous explanation, I highlighted the practice of splitting a complex query into two simpler ones, a technique that significantly improved query speed and accuracy in my case. However, there was a trade-off: I temporarily sacrificed the ability to perform searching, ordering, and precise pagination on specific fields. Despite this limitation, the approach proved valuable for my specific situation.

Proper Caching: The Ultimate Savior

Caching can be a game-changer, earning you appreciation. Retrieving data from cache is much faster than fetching it from a database or storage file.

how to do efficient caching ?

  1. We can cache complete API responses for a time period.
  2. We can reset the cache based on signals (Reset cache before ttl).
  3. Setting a TTL is crucial to prevent potential issues, as stale data can lead to incorrect and inconsistent results.

Things to Remember

  1. Implementing a proper TTL (Time To Live) is essential. You can achieve this by either setting the TTL when storing a key in the cache or by using signals to manage it effectively.

--

--