Django distinct() query method introduction

Hemanth S P
DjangoTube:
Published in
5 min readApr 9, 2022
Django distinct query method introduction
Photo by Ricardo Gomez Angel on Unsplash

In this article, we are going to see how the DISTINCT expression/query method works on Django queryset and with different databases.

Contents:

  1. What is a distinct expression?
  2. How to use this in Django?
  3. Code examples and some notes.

1. What is a distinct expression?

Distinct is an expression/query clause used in SQL to remove duplicated results and return unique results based on the column we select.

2. How to apply this in Django?

We can use a distinct() query method at the end or in between queryset and with other regular Django query methods like all(), values(), etc…

3. Let's see some examples

a). I created a simple model called user.

user model

b) Loaded the sample data to the user model and it looks like the below.

sample data

c) Excecute the distinct() query and printing its results and corresponding SQL query.

User.objects.distinct()==================SQL====================
SELECT DISTINCT "polls_user"."id",
"polls_user"."name",
"polls_user"."age"
FROM "polls_user"
==============Result======================
<QuerySet [<User: User object (1)>, <User: User object (2)>, <User: User object (3)>, <User: User object (4)>]>

In the above code, you can see the DISTINCT method is applied to all the available columns on the model(actually we never use this code because it returns all results like all(), at least as of now no use).

3. a. How to apply distinct on all columns or a few columns?

  1. what is applying distinct on all columns?

Applying distinct on all columns using the below format.

SELECT DISTINCT table_column;

User.objects.all().distinct()

distinct widely used while joining related fields.

2. what is applying distinct on a few columns?

Applying distinct on multiple columns using the below format and its works only in PostgreSQL database. in this case, returned results will be unique to the combination of the given columns.

SELECT DISTINCT ON table_column1, . . . . . table_column_n;

User.objects.using("postgress").distinct("name")

OR

User.objects.using("postgress").distinct("age", "name")

Note: the distinct method will also be considered the null as a unique value.

DISTINCT ON only applicable in PostgreSQL.

# How do get unique data?

a. one-way is using values()

User.objects.using(“postgress”).values(“name”).distinct()
=====================SQL===========================
SELECT DISTINCT "polls_user"."name" FROM "polls_user"

b. second way

User.objects.using(“postgress”).distinct(“name”)
=====================SQL===========================
SELECT DISTINCT
ON ("polls_user"."name") "polls_user"."id",
"polls_user"."name",
"polls_user"."age",
FROM "polls_user"

# Let’s apply the DISTINCT ON expression on the age column and check this with different databases.

a. let's check in the SQLite database.

User.objects.distinct(“age”)# you will get error saying.
# NotSupportedError: DISTINCT ON fields is not supported by this database backend

b. let's check in the PostgreSQL database.

User.objects.using("postgress").distinct("age")==============SQL========================
SELECT DISTINCT
ON ("polls_user"."age") "polls_user"."id",
"polls_user"."name",
"polls_user"."age"
FROM "polls_user"
================Results====================
<QuerySet [<User: User object (4)>, <User: User object (1)>, <User: User object (3)>]>

in the above results, user id-2 was removed because it had duplicated value of age 21.

c. let's check in the MYSQL database.

User.objects.using("mysql").distinct("age")
# you will get error saying.
# NotSupportedError: DISTINCT ON fields is not supported by this database backend

Note: While querying on related fields may produce duplicated requests, for example, isnull on reverse relation will always give duplicates results consider the below example with the ForeignKey model called Home,

The User and Home model

and added some sample data.

sample data for Home model, user ‘homy’
User.objects.filter(name=”homy”, home__isnull=False)==================Result===========================
<QuerySet [<User: User object (2)>, <User: User object (3)>]>
===============SQL================================
SELECT "polls_user"."id",
"polls_user"."name",
"polls_user"."age"
FROM "polls_user"
INNER JOIN "polls_home"
ON ("polls_user"."id" = "polls_home"."user_id")
WHERE ("polls_home"."id" IS NOT NULL AND "polls_user"."name" = 'homy')

Or more clearly.

User.objects.filter(name="homy", home__isnull=False).values("id", "name")
==================Result===========================
<QuerySet [{'id': 2, 'name': 'homy'}, {'id': 2, 'name': 'homy'}]>

so you can use a distinct query method to remove duplicates.

User.objects.filter(name="homy", home__isnull=False).distinct()
====================Result=========================
<QuerySet [<User: User object (2)>]>

Note: one caveat on distinct expression is it’s very slow, if you have millions of rows of data then you can easily notice the performance.

Note: order_by columns are always available to the distinct by default. Compare below two queries.

User.objects.only(“age”).filter(age__gte=20).order_by(“name”)==================Result================================
<QuerySet [<User: User object (3)>, <User: User object (2)>, <User: User object (1)>]>
==================SQL===================================SELECT "polls_user"."id",
"polls_user"."age"
FROM "polls_user"
WHERE "polls_user"."age" >= 20
ORDER BY "polls_user"."name" ASC

in the above code, I excluded the name field using the ONLY query from the SELECT statement. but the below code name field is included in the SELECT statement by order_by.

User.objects.only(“age”).filter(age__gte=20).distinct().order_by(“name”)==================Result================================
<QuerySet [<User: User object (3)>, <User: User object (2)>, <User: User object (1)>]>
==================SQL====================================
SELECT DISTINCT "polls_user"."id",
"polls_user"."age",
"polls_user"."name"
FROM "polls_user"
WHERE "polls_user"."age" >= 20
ORDER BY "polls_user"."name" ASC

Note: always use the order_by clause in combination with distinct to predict results otherwise it returns arbitrary results.

Note: If you get any error like the below,

SELECT DISTINCT ON expressions must match initial ORDER BY expressions
=============================================================
User.objects.using("postgress").all().distinct("age").order_by("name")
# this gives error.

that means distinct expression fields must start with order_by fields.

User.objects.using("postgress").all().distinct("age").order_by("age", "name")
# this is correct.
=============================================================
User.objects.using("postgress").all().order_by("age","name").distinct("age")
# this is correct too.

d. While using values() with distinct() together then please remember ordering by fields can affect the results, regrading there is a note in the Django doc please take look at it.

Conclusion

  1. distinct() query method used to remove duplicates.
  2. DISTINCT ON expression only applicable to PostgreSQL(i did not check in the oracle database please let me know in the comment section).
  3. order_by fields are always available in the select statement.
  4. use order_by with distinct to predict the results.
  5. be careful with model meta default ordering with related filed order in order_by (go through the Django website for more in-depth information).

******If you found this article helpful please follow me on MEDIUM.*******

--

--