Django Annotations: steroids to your Querysets

Gautam Rajeev Singh
4 min readSep 24, 2020

--

Django querysets are very powerful. It can often reduce run-time for expensive iterative operations. In this article, we are going to learn about one of those powerful features.

Prerequisite: You don’t need to be an advance level coder, but if you are reading this article, I am assuming you already have some basic knowledge about Django, Querysets and Subquery.

All the examples we are going to cover, this is how our models will look like

"""
models.py file
"""

from django.db import models
from datetime import datetime

# Model for all the books
class BookData(models.Model):
title = models. CharField(max_length=100)
author = models.CharField(max_length=100)
published_on = models.DateField(default=datetime.now)

# Model for all the book chapters
class BookChapterData(models.Model):
# Note: enum choice classes like below are available only from django3.0
class DifficultChoices(models.TextChoices):
EASY = 'easy'
MEDIUM = 'medium'
HARD = 'hard'
book = models.ForeignKey (BookData, on_delete=models.CASCADE)
chapter_name = models.CharField(max_length=100)
is_mcq_available = models.BooleanField(default=False)
num_of_topics = models.IntegerField()
difficulty = models.CharField(max_length=10,
choices=DifficultChoices.choices,
default=DifficultChoices. EASY)

What is an annotation?

Annotate generate an independent summary for each object in a queryset. In simpler terms, annotate allows us to add a pseudo field to our queryset. This field can then further be used for filter lookups or ordering.\

Ex: Suppose you want to all the Books along with the total number of chapters present in each one, then you could use annotate as follows:

# Simple annotate example
from django.db.models import Count
book_qs = BookData. objects.annotate(chapters_cnt=Count('bookchapterdata'))
Simple annotate example

Note: that the parameter(‘bookchapterdata’) inside the Count function won’t be the same if you are using related_name in BookChapterData model.

How to filter or sort using annotated field?

One of the most powerful features of Django annotate is that you can perform lookups or even order_by to it like any other model fields.

Let’s understand this with the above example itself. Now we want to have all the books with at least 10 chapters, sorted with chapters count.

# Annotate with filter and order
book_qs = BookData.objects. \
annotate(chapters_cnt=Count('bookchapterdata')). \
filter(chapters_cnt_gt=10). \
order_by('-chapters_cnt')

Aggregation within Subquery() using annotate()

Using proper combination of filter(), values() and annotate(), we can use aggregate inside a Subquery.

The above examples were pretty simple and straight forward, likes of which can be found easily over the internet. Let’s turn up our difficult notch a bit. If you are familiar with Django, chances are that you might have heard about Subquery. It is used to add an explicit query into our expression (or queryset). To understand the use-case, let’s take an example.

We want to have the total number of topics from those ‘easy’ chapters which have MCQ’s available (Refer to the model image). To break it down, we want to calculate the sum of num_of_topics, with is_mcq_available=True and difficulty=’easy’

Let’s solve it using annotate and Subquery.

(Note: there can be multiple ways to come up to the solution of the above problem statement, but I am choosing to use Subquery to help you understand how it works)

# Annotate with Subquery
from django.db.models import OuterRef, Subquery, Sum

chapter_subquery = BookChapterData.objects. \
filter(book=OuterRef('id'), is_mcq_available=True, difficulty='easy'). \
order_by().values ('book_id'). \
annotate(sum_of_topics=Sum('num_of_topics')). \
values('sum_of_topics')

book_queryset = BookData.objects. \
annotate(easy_mcq_topics_cnt=Subquery(chapter_subquery))

Note: we cannot use aggregate() inside a Subquery() because aggregate() returns a dictionary, and Subquery is only made to handle a Django queryset object.

Breakdown of above chapter_subquey object

  1. filter(…..): limits the subquery to the relevant parameters.
  2. order_by(): removes the default ordering(if any) on the BookData model.
  3. values(‘book_id’): aggregates (group by) the BookChapterData by the BookData
  4. annotate(sum_of_topics=Sum(‘num_of_topics’)): performs the aggregation

Another point to note here while using subquery for the goal of aggregation, that the Subquery() is only built to handle the querysets which returns only one row. If we do not group by(i.e. use order_by() and values()) smartly, our Subquery will return multiple rows and an error will be thrown.

This is the only way to perform an aggregation within a Subquery, as using aggregate attempts to evaluate the queryset.

Before we finish

There are many operations you can perform using annotations, which can be best found out by practice. But here are some tips to move forward:

  1. Serializer Friendly: Annotated fields can be used inside a serializer just as aa normal model field.
  2. Handy Operations: All kinds of filter lookups, conditional statements and sorting are possible using an annotated field.
  3. Difference between annotate() and aggregate(): annotate() aims to evaluate every row in a queryset and create a pseudo field for the same, whereas aggregate() evaluates the queryset as a whole and brings out the final result.

--

--