Django and its default values

Published in

Botify Labs

6 min readSep 16, 2020

A little story of a common Django pitfall

Picture of the book “Two scoops of Django 1.11” — Two Scoops of Django

At Botify, our monolith is built with Python and its web framework Django. It has withstood time, earned our trust to the point of beating our approach to microservices and most importantly, it enables us to collaborate seamlessly as a team. Along the way, we’ve learned about the things we love, and about the few quirks of the framework that we’ve had to change to better fit our needs.

This article dives into one of those pitfalls: how Django and its migration system handles default values on model fields. Our goal is to share the context in which Django’s behaviour caused us problems, and how we went about fixing them.

Django migrations: a chicken-and-egg situation

Django comes with its built-in migrations framework, a powerful system that uses Python classes to describe your database schema and leverage an ORM. Any changes to the model classes will be reflected in a new migration that, when applied sequentially, builds the expected state of the database.

One inherent aspect of Django’s migration system is that it expects synchronicity between the code and the database.

The system expects these Python model classes and the database tables and columns to be consistent with each other in order to function correctly.

When dealing with a Django migration, deploying a new version to a set of machines can quickly become tedious. In order to avoid downtime, you don’t know whether you should first deploy the new code or update the database, sort of a “chicken-and-egg” problem.

At Botify, we scale our data processing algorithms elastically on a large number of machines to handle the workload of some of the Web’s biggest sites. We quickly ran into issues when trying to deploy code to multiple machines synchronously, and simultaneously migrating our database.

Let’s explore the specific case of adding a new field to a model and filling all existing rows with a default value.

How Django handles default values

The key behaviour is:

In Django, the default value is handled on the application side, not by the database.

When saving a new model instance using Django, it is the framework, meaning the Python code, that sets the default value in the generated INSERT SQL statement.

But when adding a new field to the model, it is the database that has knowledge of the default value and fills existing rows.

As an example, take the following migration in which we add a preference field to our User model with the default value 0.

operations = [
    migrations.AddField(
        model_name="user",
        name="preference",
        field=models.IntegerField(default=0),
    ),
]

When we generate the corresponding SQL for PostgreSQL, through the Django management command sqlmigrate, we get:

BEGIN;
--
-- Add field preference on user
--
ALTER TABLE "auth_user" ADD COLUMN "preference" integer DEFAULT 0 NOT NULL;
ALTER TABLE "auth_user" ALTER COLUMN "preference" DROP DEFAULT;
COMMIT;

The first SQL statement adds the column and fills all existing rows with the default value. The second one drops the default so that the framework handles it on the application-side for new rows.

Why is Django doing this?

Before finding ways around this issue, it’s important to understand why Django decided to implement their default values the way they did. Let’s explore the reasons behind this behaviour.

Default values are rather flexible. They can be any value or function returning the appropriate type which makes it difficult to express any Python code as an SQL expression. Additionally, Django isn’t tied to one database. It supports multiple ones, as long as a driver exists for it. Therefore not every piece of logic is easily expressed in SQL, and not every database supports the same SQL statements and dialects.

On a related note, when using a more complex Python function as default value for a new column, Django will execute the function only once and use the result as default value for each row. As a side-effect, even if the function uses a non-deterministic element (like randomness or the current date), all existing rows will be filled with the same value. For more complex default values for new columns, data migrations are more appropriate.

Let’s focus now on the issues we encounter because of the default value only known by the application and not the database.

What could go wrong?

The fact that Django drops the database default value often creates a cognitive gap between the expected database schema and the actual one. In this section, we will explore the issues that arise during development and deployment of adding and deleting a NOT NULL column with a default value.

Intuitively, adding such a column is expected to work smoothly. However, the change is backward incompatible.

A backward incompatible migration means that a previous version of the code will not work with the new migration.

Imagine you are adding a new field to a model that cannot be null and with a default value:

if you deployed the code first, it would expect a certain column to exist, but it doesn’t. You probably experienced that during development, because you forgot to migrate the database: ProgrammingError: column auth_user.preference does not exist.
so you decide to deploy the database first. However, you are adding a NOT NULL column that won’t have a default value in the end in the DB. Therefore, when the code tries to insert a row, it doesn’t know the new column and can’t specify a value for it: Column preference cannot be NULL.

In semantic versioning, this is what you would call an incompatible API change. This is problematic in multiple ways. It will hinder you:

during development. You add a column on one branch, migrate the DB, switch to another branch but the code misses a column.
during deployment. As explained in the example above, downtime is unavoidable unless you’re able to update both the code on all servers and your database(s) at the exact same time.

When dropping a column, similar concerns arise. At Botify, our infrastructure team doesn’t like to drop an unused column just after releasing code that stops using it. The DROP operation can require locking a critical table, and if we need to rollback, we gain a lot of time if we don’t need to recover data from a backup.

Nevertheless, by keeping an unused column that is NOT NULL with no database default, you might feel what is coming, you will get errors.

The reason is the same: the default value is handled at the application level, but the application doesn’t know the field anymore. The code won’t specify a value for the forsaken column, the database cannot provide a default, it is therefore interpreted as NULL, but the column has a NOT NULL constraint: column cannot be NULL.

What can we do about it?

One of the simplest solutions is to have the default value on the database. If each field with a Django default translates to a database default, cognitive load will be eased during development and deployment.

There are two rather straightforward ways to do this:

either you add a Django RunSQL statement that will add the default value, but you’ll need to write SQL for the specific database you are targeting,
or you can use a 3rd party library (such as django-add-default-value) which allows specifying the default value through an additional operation in the migration. This lifts off the burden of writing database-specific SQL.

Non-trivial default logic will still be complicated to implement in SQL nonetheless.

Ultimately, one does not always think about adding a database default when adding or altering a Django field. And when you remember, it’s often too late and you’re already deploying to production.

To make it easier on yourself, we recommend automating this process and using appropriate tooling. For instance the Migration Linter will automatically detect if you are adding a backward incompatible migration. It integrates into your CI/CD pipeline or directly into the makemigrations call in order to warn or refuse the generation of a problematic migration.

Conclusion

The behaviour of default values makes sense but has drawbacks developers should be aware of. Having a general idea about how a Django migration translates to SQL is definitely useful, like understanding why the default value is not reflected in the database schema. Automating the detection of backward incompatible migrations will help developers grasp what is happening under the hood, and in the end, avoid downtime during deployment and ensure a smooth development experience.

By understanding the fundamentals and choosing the appropriate tools, we overcome the little quirks and enable ourselves to work on a great stack for a smooth developer experience.

Interested in joining us? We’re hiring! Don’t hesitate to send us a resume if there are no open positions that match your skills, we are always on the lookout for passionate people.