Conditional Expressions in Django
Our product allows users to create and run Jupyter Notebooks, among other things. Each user notebook instance is modeled in our Django application as a
Server object. Of course, we must track how our users use these notebooks in order to bill them correctly, see what settings are the most popular, etc. To accomplish this, each time a notebook is started, a corresponding
ServerRun record is created in the database. When a notebook is stopped for any reason, the record is updated accordingly. Here is the model definition for the
We’ll focus on the scenario where we use these
ServerRun records for billing purposes, as it provides a more interesting example. First, we must get all open invoices:
Before we delve into the query used to calculate usage for a given invoice, an explanation of what the query must find: We need the sum of all server run time for a given user during their currently active billing period. Note that each user has their own billing period, dependent on when they signed up. Let’s consider the possible states of a
ServerRun object, which directly correspond to the scenarios we have to account for in our query. In general order of complexity:
- Started and stopped during the current billing period
- Started before the current billing period began, but stopped during
- Started during the current billing period, and still running at query time
- Started before the current billing period, and still running
If we only had to concern ourselves with the first scenario, things would be pretty simple. Just calculate the run’s duration by subtracting the start timestamp from the stop timestamp, and multiply that by the server’s memory size (with some unit conversions of course). Then bam, you have the usage generated by that particular customer. Of course, the real world is never that simple. Here’s the query used:
This is best explained by examining the query at a high level, and zooming in on the details as we go.
From this perspective, things are simple. We are annotating a
Sum with the key
usage . However, the value being used in the sum is ultimately the combination of a couple fields. The values in question, run duration and the amount of memory the user has configured their notebook with, are of types
IntegerField respectively. When combining fields of different types like this, you must provide the
output_field argument so Django knows what to return. Hence the
output_field=DurationField() at the end of the
Sum statement. Note that you must provide an instance, not a class.
This step is probably not too surprising. Here we’re telling Django to sum whatever the output of this
Case statement is. The
Case object is simple: Give it an arbitrary number of
When objects (which we’ll discuss in a second), and perhaps a default value. The first
When object whose condition is met is returned, or if none match, the
default value is returned. The
Value object, or
DurationValue in this case, is simply a way to tell Django how to cast arbitrary python values to SQL. The
Value class accepts
output_field as an argument at creation time as well.
As you can see, the first argument to the
When constructor is the boolean condition that will be used for comparison. This condition is allowed to be any (or almost any) valid field lookup, including
Q objects. Of course, the value provided to
then is used if the condition is matched.
There is a lot to unpack from the previous step to this one. First, note that both conditions for the
When statement are very similar. The first handles any
ServerRun that has been stopped, while the second handles those that are still running. Let’s break down each
then statement individually.
Greatest object does exactly what you would think: it accepts an arbitrary number of expressions, and returns the greatest of their values. These expressions can be field names or python expressions. If you use the latter, you may have to wrap it in a
Greatest is used here so that users are only billed for the usage in the current billing period. In other words, if the run in question began before the current billing period we are only interested in the run time that has occurred since the start of the current billing period. We subtract the start time from the stop time (note that this
When only handles runs where
stop__isnull=False so this is guaranteed to be a valid operation) and multiply it by the run’s
Recall that when multiple expressions of varying types are combined in an
aggregate statement, we must tell Django what the resulting type will be. This means that we must use
output_field , however
F expressions do not support
output_field . If you think about it for a moment, it makes sense that this would be the case: If
output_field, which should be respected? The first
F which defines
output_field , or the last? Order of operations shenanigans could muddle what “first” and “last” even mean, and those are just a few considerations. To handle this exact scenario, Django helpfully provides the
ExpressionWrapper . Simply wrap an expression in an
ExpressionWrapper, and you can provide
output_field , allowing Django to correctly cast the value in SQL.
When expression is essentially identical to the first one, except that instead of using the run’s
stop value (since it’s null),
timezone.now() is used.
For some context, here is the query in its entirety once again:
You may be wondering why we multiply a run’s duration (a
timedelta object) with the notebook’s memory size (an
int ). After we convert the
timedelta object to hours, and
server_size_memory to gigabytes, this value represents gigabyte hours which is one of the metrics we bill based on. The stoichiometry, while simple, is left as en exercise for the reader. 😉
Hopefully you find this real world example of conditional expressions in Django to be useful. If you have any comments or questions don’t hesitate to reach out!
Interested? Learn more by dropping us a note. Sign up today for a free account. No credit card required!