Django ORM advanced tips and tricks

6 min readDec 4, 2021

Greetings!

Introduction

Doesn’t matter if you want it or not — you have to know ORM well if you want to master how to write a clean and productive layer to communicate with Database Management Systems (DBMS).
I do highly recommend getting acquainted with your primary DBMS to understand how it stores data, retrieves data for you, and how manipulates with data. What are indexes, aggregate functions, etc. For example, starting with PostgreSQL won’t be a miss, it’s free and widely used. Here is a link from the official PostgreSQL site, which they recommend, you can pick any:

Books

Title: PostgreSQL: The First Experience Author: Pavel Luzanov, Egor Rogov, Igor Levshin (translated by Liudmila…

www.postgresql.org

After that, I highly recommend spending some time and reading all sections from wonderful (really it’s the best doc I’ve ever read) Django documentation about ORM. Also, you can download it’s as PDF, ePub, or HTML and read on your reader or tablet with no connection (I spent a lot of time in a subway reading this in such a way)

On the right there is **Download** section

But I have some advanced tips for you, so you must have some background using Django ORM!

So let’s discover some of them!

#1 Use .values() with .get()

.values() method returns entities as Python native dictionary objects. You can use it with queryset objects only. But .get() method returns a model instance, so you can’t chain your call like this:

entity_dict = MyModel.objects.get(pk=1).values("pk", "name")

you will get AttributeError because .get() returns model instance which does not have .values() method. In case you want to return your instance as Python dictionary object use this:

entity_dict = MyModel.objects.values("pk", "name").get(pk=1)

Same applies for another methods of type QuerySet(such as values_list, only, etc.).

#2 Need a map from query? Use .in_bulk()

In my experience there are a lot of cases when I had to retrieve objects from a database and create map objects with this pattern:

entities = MyModel.objects.filter(id__in[1, 2, 3])
entities_map = {entity.pk: entity for entity in entities}

But Django has built in solution for you:

entities_map = MyModel.objects.in_bulk([1, 2, 3])

Will return map. pk — used as default for map keys, you can specify another field with field_name keyword argument which should be unique.

Agree that code much cleaner? And you have performance boost due to Django ORM being lazy here. In first case ORM executed immediately because dict comperhension triggers __iter__ magic method which makes actual DB calls.

#3 Stop importing User model directly!

Really! When starting a new project from scratch Django documentation recommends implementing a custom user model before any migration is applied so you can extend your user model in the future without any headaches. You can read more on this here:

Customizing authentication in Django | Django documentation | Django

The authentication that comes with Django is good enough for most common cases, but you may have needs not met by the…

docs.djangoproject.com

Who knows what happens in the future and what requirements are waiting for you? For this purpose, Django developers added a utility function that will return to you the active model used to manage users.

from django.contrib.auth import get_user_modelactive_user_model = get_user_model()

If you have any code related to this model (e.g. relation to the user in another model, some auth logic code which lays somewhere in your views) always use this util function, it will save you from headaches in the future refactorings.

#4 Complex querying with Q object and or_ operator

You can use query with Q object like this

tags = ["urgent", "can_forget", "unknown"]
names = ["one", "two", "three"]
MyModel.objects.filter(
    Q(tag__in=tags) | 
    Q(name__in=names)
)

This will query all items which tag is equal to one of [“urgent”, “can_forget”, “unknown”] OR which names are one of these [“one”, “two”, “three”].

Our intention is to make query which could be described like this:

(tag=”urgent” AND name=”one”) OR 
(tag=”can_forget” AND name=”two”) OR 
(tag=”unknown” AND name=”three”)

Then you can use something like this:

tags = ["urgent", "can_forget", "unknown"]
names = ["one", "two", "three"]
MyModel.objects.filter(
    Q(tag=tags[0], name=names[0]) | 
    Q(tag=tags[1], name=names[1]) | 
    Q(tag=tags[2], name=names[2])
)

We don’t want to hardcode indexes in our production code, but how do we chain here? Maybe like this:

tags = ["urgent", "can_forget", "unknown"]
names = ["one", "two", "three"]
MyModel.objects.filter(
    *(Q(tag=tag, name=name) for tag, name in zip(tags, names))
)

Not valid, it is equal to:

tags = ["urgent", "can_forget", "unknown"]
names = ["one", "two", "three"]
MyModel.objects.filter(
    Q(tag=tags[0], name=names[0]) & (should be |)
    Q(tag=tags[1], name=names[1]) & (should be |)
    Q(tag=tags[2], name=names[2])
)

functools.reduce with operator.or_ helps us to chain generator expression with OR operator.

Watch this:

from functools import reduce
from operator import or_tags = ["urgent", "can_forget", "unknown"]
names = ["one", "two", "three"]
MyModel.objects.filter(
            reduce(
                or_,
                (
                    Q(tag=tag, name=name) for tag, name in
                    zip(tags, names)
                 ),
            )
)

functools.reduce will accumulate each pair of tag, name with operator.or_ which will behave as expected. This is usefull when you want to chain using OR operator complex filters which involve more than one column in filter expression. Other way you should use simple query like in the beggining of this section:

tags = ["urgent", "can_forget", "unknown"]
names = ["one", "two", "three"]
MyModel.objects.filter(
    Q(tag__in=tags) | 
    Q(name__in=names)
)

Here filters depends on one column for each Q expression.

#5 Use of .only() and .prefetch_related() with .select_related()

Check out this snippet

class Post(models.Model):
    id = models.BigIntegerField()
    title = models.CharField()
    body = models.TextField()
    description = models.TextField()
    tags = models.ManyToManyField()
    created_at = models.DateField()
    updated_at = models.DateField()
    ....
    slug = models.CharField()
    owner = models.ForeignKey()
posts = Post.objects.filter(created=datetime.date.today())
api = ThirdPartyAPI()
service = InternalService()results = []for post in posts:
    data = api.fetch_data_for_post(
               id=post.id, 
               date_gte=post.updated_at,
           )
    results.append({post.id: data.parse()})service.provide_data(results)

Did you notice that we use only two fields from each post instance when fetching data? But when Django ORM casts Python code to sql it fetches each field from model as a result — we get data that we do not actually need. In this case use .only() method and mention what fields you actually need.

...
posts = Post.objects.filter(
                created=datetime.date.today()
        ).only("id", "updated_at")
...

Same when it comes to fetching data for related instances. This snippet will also work if you have to fetch data for particular owner of a post:

...
posts = Post.objects.filter(
                created=datetime.date.today()
        ).select_related("owner__email").only(
            "id", 
            "updated_at", 
            "owner__email",
        )
api = ThirdPartyAPI()
service = InternalService()
results = []for post in posts:
    data = api.fetch_data_for_post(
               id=post.id, 
               date_gte=post.updated_at,
               creator=post.owner__email,
           )
...

Use your chain efficiently, especially when it comes to million rows of data.

#6 Use of .iterator()

If you have to perform one time operation for many entities always use .iterator() method of QuerySet object. This prevents queryset from caching and fetch entites from database by chanks. You can set custom chunk size if you pass chunk_size argument to this method.

Please note, use only if you need to perform one time operation to each entity in chunk as any of the entity is going to be cached by Django and that means if you want to interact with entity again after iteration Django will make DB call again to fetch it.

Possible use case: if you need to update data in your table from some 3rd party API and calculations should be provided before saving each entity. In this case you don’t want to save too many rows in RAM. Iterator will provide you with each entity one by one, you do your calculations, save instance and forget about it; repeats till iterator gets exhausted and you have memory even for complex calculations.

Consider using .iterator() when data sets are huge, this absolutely reduces memory usage.

Conclusion

Mastering ORM could be challenging as it’s a really tough thing, its implementation is really hard as well. So, I believe that we should share advanced tips with each other to help master it because it happens that you spend a whole day thinking about your ORM optimizations and try to keep it clean as well.

I hope you discovered something new for yourself.

Please, subscribe if you share the same interests!
Thanks for your attention!