Conditional Expressions in Django

Published in

IllumiDesk

5 min readDec 18, 2017

Our product allows users to create and run Jupyter Notebooks, among other things. Each user notebook instance is modeled in our Django application as a Server object. Of course, we must track how our users use these notebooks in order to bill them correctly, see what settings are the most popular, etc. To accomplish this, each time a notebook is started, a corresponding ServerRun record is created in the database. When a notebook is stopped for any reason, the record is updated accordingly. Here is the model definition for the ServerRun class:

We’ll focus on the scenario where we use these ServerRun records for billing purposes, as it provides a more interesting example. First, we must get all open invoices:

There is only one open invoice per customer at any given time.

Before we delve into the query used to calculate usage for a given invoice, an explanation of what the query must find: We need the sum of all server run time for a given user during their currently active billing period. Note that each user has their own billing period, dependent on when they signed up. Let’s consider the possible states of a ServerRun object, which directly correspond to the scenarios we have to account for in our query. In general order of complexity:

Started and stopped during the current billing period
Started before the current billing period began, but stopped during
Started during the current billing period, and still running at query time
Started before the current billing period, and still running

If we only had to concern ourselves with the first scenario, things would be pretty simple. Just calculate the run’s duration by subtracting the start timestamp from the stop timestamp, and multiply that by the server’s memory size (with some unit conversions of course). Then bam, you have the usage generated by that particular customer. Of course, the real world is never that simple. Here’s the query used:

This code has been modified slightly (variable names, indentation) for readability

This is best explained by examining the query at a high level, and zooming in on the details as we go.

From this perspective, things are simple. We are annotating a Sum with the key usage . However, the value being used in the sum is ultimately the combination of a couple fields. The values in question, run duration and the amount of memory the user has configured their notebook with, are of types DurationField and IntegerField respectively. When combining fields of different types like this, you must provide the output_field argument so Django knows what to return. Hence the output_field=DurationField() at the end of the Sum statement. Note that you must provide an instance, not a class.

This step is probably not too surprising. Here we’re telling Django to sum whatever the output of this Case statement is. The Case object is simple: Give it an arbitrary number of When objects (which we’ll discuss in a second), and perhaps a default value. The first When object whose condition is met is returned, or if none match, the default value is returned. The Value object, or DurationValue in this case, is simply a way to tell Django how to cast arbitrary python values to SQL. The Value class accepts output_field as an argument at creation time as well.

As you can see, the first argument to the When constructor is the boolean condition that will be used for comparison. This condition is allowed to be any (or almost any) valid field lookup, including Q objects. Of course, the value provided to then is used if the condition is matched.

This is essentially the entire query, but snipped for the sake of readability.

There is a lot to unpack from the previous step to this one. First, note that both conditions for the When statement are very similar. The first handles any ServerRun that has been stopped, while the second handles those that are still running. Let’s break down each then statement individually.

The Greatest object does exactly what you would think: it accepts an arbitrary number of expressions, and returns the greatest of their values. These expressions can be field names or python expressions. If you use the latter, you may have to wrap it in a Value object. Greatest is used here so that users are only billed for the usage in the current billing period. In other words, if the run in question began before the current billing period we are only interested in the run time that has occurred since the start of the current billing period. We subtract the start time from the stop time (note that this When only handles runs where stop__isnull=False so this is guaranteed to be a valid operation) and multiply it by the run’s server_size_memory .

Recall that when multiple expressions of varying types are combined in an annotate or aggregate statement, we must tell Django what the resulting type will be. This means that we must use output_field , however F expressions do not support output_field . If you think about it for a moment, it makes sense that this would be the case: If F supported output_field, which should be respected? The first F which defines output_field , or the last? Order of operations shenanigans could muddle what “first” and “last” even mean, and those are just a few considerations. To handle this exact scenario, Django helpfully provides the ExpressionWrapper . Simply wrap an expression in an ExpressionWrapper, and you can provide output_field , allowing Django to correctly cast the value in SQL.

The second When expression is essentially identical to the first one, except that instead of using the run’s stop value (since it’s null), timezone.now() is used.

For some context, here is the query in its entirety once again:

You may be wondering why we multiply a run’s duration (a timedelta object) with the notebook’s memory size (an int ). After we convert the timedelta object to hours, and server_size_memory to gigabytes, this value represents gigabyte hours which is one of the metrics we bill based on. The stoichiometry, while simple, is left as en exercise for the reader. 😉

Hopefully you find this real world example of conditional expressions in Django to be useful. If you have any comments or questions don’t hesitate to reach out!

Interested? Learn more by dropping us a note. Sign up today for a free account. No credit card required!

Conditional Expressions in Django

Written by John Griebel