Djaq: A Different Concept for Django Queries

Paul Wolf
10 min readJan 8, 2019

--

Writing Django Queries

The concept of this Python module has changed somewhat since this article was first published. For that reason, this is no longer a completely accurate view of how I would intend for Djaq to be used. For the most updated information on Djaq, see the github repo readme: https://github.com/paul-wolf/djaq. Paul Wolf

Djaq (pronounced “jack”) is the name of a Python module that provides Django with an alternative to QuerySets. It’s a proof-of-concept intended to demonstrate a different way to retrieve data using Django models by emphasising different feature priorities from QuerySets. Djaq departs from QuerySets by forcing explicit declaration of data to be retrieved, explicit calls to the db and a clearer, Python-like language to define filters and column expressions. Here’s an example using books as a domain model:

DQ("""(b.name,
b.price as price,
0.2 as discount,
b.price * 0.2 as discount_price,
b.price — (b.price * 0.2) as diff,
Publisher.name as publisher
) Book{b.price > 50} b""")

This will return results with name, price, discount, discount_price, diff and publisher fields for books that cost more than 50.

Often, when beginning Django developers encounter QuerySets for the first time they say “this is good but it’s not as powerful as SQL” or “I can’t use QuerySets for my query. I need SQL”. I find this sentiment is usually due to lack of awareness of QuerySet sophistication. The QuerySet API is incredibly powerful. By definition, QuerySets are not as expressive as SQL (they generate SQL), but it’s rare that you can’t do something with them. This is the result of a fundamentally sound design plus many years of development from many contributors to Django. It’s good practice in Django projects to resist the temptation to use raw SQL unless you really need to, a rare occurrence, in my experience.

This article is aimed at developers with experience of Django’s ORM. If you are new to Django, you would be better served becoming proficient in QuerySets than contemplating using a new tool like Djaq.

The Django ORM is a great tool for database access. But it is driven by a specific set of choices about how to use the ORM. The main thrust of QuerySets is automagic access to related data. It’s easy to perform many operations with little code and no knowledge of SQL. For these goals, it is excellent. But committing to a set of target behaviours entails tradeoffs:

  • Unnatural query formulation: The notation for QuerySets, like filtering, can appear obscure and verbose.
  • Inefficient queries: the default sql generated by simple QuerySets is not always the most efficient.
  • Inadvertently inefficient database usage: Because QuerySets do such a great job hiding complexity, they often hide important consequences of patterns that can produce massively slow code. Mainly this exhibits when iterating the result of a QuerySet when the developer does not notice that an operation on a Model instance is causing a new query for every iteration.
  • Obscure access to features like complex column expressions, subqueries, etc. F expressions, OuterRefs, and other features are powerful but they can seem arcane and hard to understand.

Djaq provides an alternative query language guided by a different set of targeted behaviours intended to force the developer to be explicit about what she wants to do and provide the fastest possible retrieval methods using the clearest possible expression.

Djaq is available here: https://github.com/paul-wolf/djaq. See the readme for more information on installation. The readme also provides far more information about available features that are not covered here.

Some important notes before you try it:

  • Djaq sits on top of Django Models. It is an alternative to the Django QuerySet class and it only works with Django Models.
  • It is entirely compatible with QuerySets being used in the same application.
  • Djaq only enables queries. It does not provide data update features, like creating new db records or modifying existing ones.
  • It’s a work-in-progress. It is not refined or tested enough to use in a production environment.

Reasons why Djaq might be a reasonable alternative to QuerySets for some use cases:

  • More natural query formulation using a Python-like syntax
  • More fine-grained control of result sets
  • Eschew laziness: don’t trigger queries on behalf of the user except just once in an explicit and predictable manner
  • Default to cursor semantics rather than sliceable lists of objects for better performance

It is an essential objective for Djaq to not return data that was not explicitly solicited. If the developer asks for id and name of a Book and tries to access pubdate, this will fail by design, the opposite of the design intent of QuerySets. The same goes for lazy loading of related data. Djaq only returns results with explicitly requested related data. It will never lazily fire off queries to the db to help you get data you forgot to ask for.

Here are some examples comparing Djaq to QuerySets. I use the models provided as examples here: https://docs.djangoproject.com/en/2.1/topics/db/aggregation/. For convenience:

from django.db import modelsclass Author(models.Model):
name = models.CharField(max_length=100)
age = models.IntegerField()
class Publisher(models.Model):
name = models.CharField(max_length=300)
class Book(models.Model):
name = models.CharField(max_length=300)
pages = models.IntegerField()
price = models.DecimalField(max_digits=10, decimal_places=2)
rating = models.FloatField()
authors = models.ManyToManyField(Author)
publisher = models.ForeignKey(Publisher, on_delete=models.CASCADE)
pubdate = models.DateField()
class Store(models.Model):
name = models.CharField(max_length=300)
books = models.ManyToManyField(Book)

While Djaq is currently a proof-of-concept, all examples provided below will work. Let’s compare getting all the books and iterating over them:

# get all books with a QuerySet
qs = Book.objects.all()
for book in qs:
print(book)

# get all books with Djaq
dq = DQ("(Book.id, Book.name)")
for book in dq.tuples():
print(book)

Notice, with Djaq we provided explicit fields that we want returned. It might look slightly more complicated but in fact most queries are less verbose in Djaq:

# Get average price of books 
from django.db.models import Avg
Book.objects.all().aggregate(Avg('price'))

{'price__avg': 18.466780449193628}

# average price using Djaq
DQ("(avg(b.price)) Book b").value()

Decimal('18.4667804491936282')

The full syntax for Djaq queries:

(<field_exp1>, …) <ModelName>{<filter_expression>} <alias> order by (<field_exp1>, …)

This expression is a string passed as the first argument when creating a DjangoQuery object. In our examples we import under the alias DQ:

from djaq import DjangoQuery as DQ

For QuerySet examples, we assume imports like this:

from django.db.models import Q, Avg, Count, Min, Max, Sum, \ 
FloatField, Subquery, OuterRef
from books.models import Author, Publisher, Book, Store

With Djaq, you don’t need to import Count, Sum, Min, etc. or the Model classes. Count books:

dq = DQ("(count(Book.id)) Book")

We can get the result as a single value:

DQ("(count(Book.id)) Book").value()

Or use an iterator:

for t in DQ("(count(Book.id)) Book").tuples():
print(t)

The reason for Djaq is to target a different set of concerns about data retrieval rather than try to be better than the QuerySet class. There are pros and cons to both. Very few queries in Djaq cannot be produced in QuerySets although mostly the Djaq syntax is more straightforward and unified whereas QuerySets require a somewhat sprawling set of techniques like aggregate(), annotate(), Q() expressions and F() expressions to give access to complex column expressions.

Get the average price of books:

DQ("(avg(b.price)) Book b")

compared to:

Book.objects.all().aggregate(Avg('price'))

Get the difference off the maximum price:

DQ("(Publisher.name, max(Book.price) — avg(Book.price) as price_diff) Book b")

compared to:

Book.objects.aggregate(price_diff=Max(‘price’, output_field=FloatField()) — Avg(‘price’))

Count books per publisher:

DQ("(Publisher.name, count(Book.id) as num_books) Book b")

compared to:

Publisher.objects.annotate(num_books=Count("book"))

Count books with ratings up to and over 5:

DQ("""(sum(iif(b.rating < 5, b.rating, 0)) as below_5, 
sum(iif(b.rating >= 5, b.rating, 0)) as above_5)
Book b""")

compared to:

above_5 = Count(‘book’, filter=Q(book__rating__gt=5))
below_5 = Count(‘book’, filter=Q(book__rating__lte=5))
Publisher.objects.annotate(below_5=below_5).annotate(above_5=above_5)

Get average, maximum, minimum price of books:

DQ("(avg(b.price), max(b.price), min(b.price)) Book b")

compared to:

Book.objects.aggregate(Avg(‘price’), Max(‘price’), Min(‘price’))

The DjangoQuery class comes with these generators:

  • tuples(): return tuples
  • json(): return json representation of each result row
  • dicts(): return a dict for each result row
  • csv(): return a csv respresentation of the result row

Just as there is a ModelInstance class in Django, we have a DQResult class:

objs(): return a DQResult for each result row, basically a namespace for the object:

dq = DQ("(b.id, b.name, Publisher.name as publisher) Book b")
for book in dq.objs():
title = book.name
publisher = book.publisher

Note that by default, you iterate using a generator. You cannot slice a generator.

DjangoQuery.value(): when you know the result is a single row with a single value, you can immediately access it without further iterations:

DQ("(count(Book.id)) Book{}").value()

will return a single integer value representing the count of books.

Comparing Features

Unsurprisingly, Django provides significant options for adjusting query generation to fit a use case. Here’s a point-by-point comparison with Djaq:

  • only(): Djaq always works in “only” mode. Only explicitly requested fields or expressions are returned.
  • select_related(): The select clause only returns those columns explicitly defined. This feature makes loading of related fields non-lazy. But all queries are non-lazy in Djaq.
  • prefetch_related(): This is not really a job for Djaq. Currently, there is no support for m2m fields but when this is implemented, it will be part of the instance manager.
  • F() expressions: These are workarounds for not being able to write expressions in the query for things like column value arithmetic and other expressions you want to have the db calculate. Djaq lets you write these directly and naturally as part of its syntax.
  • Subquery and OutteRef: Djaq does not require extra imports or (much) syntactic sugar to express subqueries.
  • aggregate() or annotate() are required for aggregations and non-field expressions in QuerySets whereas with Djaq such things are natively part of the result set.
  • Model instances with QuerySets exactly represent the corresponding Django model. Djaq has a different concept of a result instance (DQResult) that represents whatever is returned by the query even if it’s not a model field.
  • At this time, Djaq does not handle M2M fields or reverse relations. This is planned but not implemented at this time.

Djaq does not try to hide as much about the underlying data structures as QuerySets. I think there are two types of Django developer, those who design a data schema and then design Django models to embody that and those who write Django models and let Django take care of the underlying schema. It may be the former type of Django developer will be more drawn to Djaq.

Results vs Model Instances

The Djaq generator .objs() returns a DQResult class instance. This is just a namespace for results, not a model instance. Djaq produces ‘results’ in contrast to model instances. Depending on what methods you use on QuerySets you may get Django Model instances or a list or a dict, etc. Djaq never returns a model instance.

Likewise, you cannot slice a DjangoQuery object because this would require semantics that make queries to the db unpredictable or at least difficult to understand for the developer. We may provide a separate class to enable this kind of operation in the future.

Parameter Binding

Django QuerySets are built around the ability to parameterise queries via function calls. It’s also customary to build queries progressively with conditional logic.

Djaq takes a different approach. The query is essentially built once. You might find it requires having multiple queries for different conditional logic. This contributes to making it clear to the developer what query is sent to the database.

Parameters that go to a query can be put through a ContextValidator. You subclass this to perform validation and mutation of input parameters. See the readme for more information.

Djaq comparative weaknesses

Just as QuerySets exhibit weaknesses as a result of feature priorities, so does Djaq. But aside from incomplete features, the biggest shortcoming is that it will be missing many things present in the Django API, especially for edge cases and those myriad smaller cases that are only exposed through years of experience on real projects.

  • You might find that QuerySets are more “composable”. You can programmatically build queries based on conditions more easily. The classic use case is building a query based on form submission for search. You iterate over the Request dict, building the query step-by-step. You can do this with Djaq but you might find it more natural with QuerySets.
  • The DQResult class does not handle m2m fields in any way at this time. TBD. It also does not handle automagic relationship following (by design) or backwards relations. Not to be confused with relationship following in a single query, which Djaq can do effortlessly.
  • Currently, Djaq is a proof of concept. It has not undergone the rigorous testing required to give confidence it will work as described and many features are just not present. Exceptions are often mysterious. Some features are half-baked. Use sound judgement.
  • Djaq only handles SELECT queries. It has no functionality for creating or updating data.
  • Djaq does not fit into the Django ecosystem as well as QuerySets. For instance, “QuerySets” are really several kinds of class. The Django framework is made to work with QuerySets and therefore may offer more options for some operations. While it’s possible to use a QuerySet as a Djaq subquery, the converse is not true.
  • All the above examples have been tested and work with Postgresql. They might work with other databases, maybe. Obviously, the Django ORM has been extensively tested with all the databases it claims to support. Broad, well-tested support is an aspiration for Djaq.

Future

Aside from improving edge case behaviour and feature completeness, Djaq might look at some hopefully innovative ways to make the life of developers easier:

  • Mutable Djaq queries based on context data. For instance, removing elements of a query where context data is not provided
  • Direct support for GraphQL

Summary

“Since the Expression Language provides lower-level Python structures that mimic a backend-neutral SQL, it feels almost identical to writing actual SQL but in a Pythonic way.” https://www.pythoncentral.io/overview-sqlalchemys-expression-language-orm-queries/

Django makes similar claims about data access being “Pythonic” except Django tries exactly to not be like writing SQL. I struggle with both of these assertions. In my opinion, there is nothing especially “Pythonic” about the APIs of SQLAlchemy or Django. They are Python APIs but they are not exemplary in their application of Python which is what “Pythonic” means.

In contrast, Djaq is the exact inverse of those ORM APIs. It is a language more than an API, a domain specific language, if you will. That language is mostly directly expressed as pseudo Python statements. Djaq minimises automagicality to prevent code over the lifetime of a project from exhibiting behaviour detrimental to performance and readability. To my mind, QuerySets are unpythonic because they promote implicitness and hidden behavioural complexity. That should not be considered damning. QuerySets are highly practical and make getting started on projects easy and fast. But there are situations where QuerySets cause unwanted side-effects.

Djaq is not presented as a superior solution to QuerySets. The purpose of Djaq is to provide an optional alternative to QuerySets that provides more explicit control over query behaviour and a more natural, Python-based language that is more readable and learnable. Users don’t need to choose one or the other. Djaq is just a separate path that might be more beneficial depending on how specific developers wish to work or for specific application areas where the performance behaviour of Djaq is more suitable.

https://github.com/paul-wolf/djaq

--

--

Paul Wolf

Software technical and business exec; Python, Django, Microservices