Django development workflow with diverged migrations in multiple dev branches

Bharath kotha
5 min readSep 20, 2019

We are a small team at Aibono working on big things using Django in a monolithic codebase. Being a small team and having to work on multiple long-running batches, we switch between different branches with diverged migrations all the time. The rest of the story explains the problem that we faced with and existing solutions for the problems and our thoughts on how we might solve the problem.

The Problem

Whenever we switch the branch we have to manually make sure that we undo all the applied migrations that are different between the source branch and target branch. If we don’t, and if any inconsistencies exist between created migrations and applied migrations , interpreter complains saying

IntegrityError: NOT NULL constraint failed: app1_samplemodel.attr2

or

django.db.utils.OperationalError: no such column: app1_samplemodel.attr2

Let me explain what do these errors mean. In the first case, Let’s create a model called SampleModel in app1 with the following code on the master branch

from django.db import modelsclass SampleModel(models.Model):
attr1 = models.CharField(max_length=60)

After that create and apply migrations (using makemigrations and migrate commands). Now that the DB is populated with new tables, initialize the empty git repository and add all the files and create the first commit. After that create and checkout a new branch called b1 from master and change the SampleModel (in app1/models.py file) as below and create and apply new migrations.

from django.db import modelsclass SampleModel(models.Model):
attr1 = models.CharField(max_length=60)
attr2 = models.CharField(max_length=60, default='')

Now switch back to master (using git checkout master) and open the shell (python manage.py shell) and type the following commands

>>> from app1.models import SampleModel
>>> sample = SampleModel.objects.create(attr1='abc')

And you should see the first error. This is happening because the SampleModel in the DB contains two columns (attr1, att2) whereas the SampleModel in Django contains only a single attribute (because of switching to master branch). Now to get the third error, create a branch b2 on top of b1 (git checkout b1 and git checkout -b b2) and change app1/models.py file once again with the following code, and re-run the migrations.

from django.db import modelsclass SampleModel(models.Model):
attr1 = models.CharField(max_length=60)

Checkout branch b1 and open the shell, and run the same commands as above and you will see the second error. Now, this is happening because the SampleModel in the DB contains a single column (attr1) and the SampleModel in Django contains two attributes (attr1, attr2). When Django tries to access attr2, the django.db module throws an error saying the attribute doesn’t exist.

That seems like a common problem. Doesn’t a solution exist already?

There indeed are few questions on stack overflow about this and a couple of opensource projects (django-south-compass and django_nomad) exist which try to solve this problem. Most of the solutions boil down to one of the following concepts

  1. Dropping all the tables and reapply migrations in the target branch from scratch. When the tables are created from scratch, all the data will be lost and needs to be recreated as well. This can be handled with fixtures and data migrations but managing them, in turn, will become a nightmare, not to mention that it will take some time to make the
  2. Have a separate database for each branch and change the settings file with the target branch’s settings every time the branch is switched using tools like sed. This can be done with a post_checkout hook. Maintaining one large database for each branch would be very storage-intensive. Also, checking out individual commit IDs might potentially produce the same errors.
  3. Finding the differences in migrations between the source and target branch, and apply the differences. We can do so with post_checkout script but there is a small issue. This post explains the issue in detail. To summarize the issue, post_checkout is run after all the files in the target branch are checked out, which includes migration files. If the target branch doesn’t contain all the migrations in the source branch when we run python manage.py migrate app1 Django won’t be able to find the missing migrations which are needed to apply reverse migrations. We have to temporarily checkout migration files in the source branch, run python manage.py migrate and checkout migration files in the target branch. django-south-compass does something very similar but is available only for up to python 2.6.
  4. Using a management command (which uses python git module), find all the migration operations differences between the source branch and the merge-base of the source branch and target branch and notify the user of these changes. If these changes don’t interfere with the reason for branch change, the user can go ahead and change the branch. Else, using another management command, un-apply all migration till merge base, switch branch, and apply the migrations in the target branch. There will be a small data loss and if the two branches haven’t diverged a lot, is manageable. django_nomad does some of this work.
  5. Keep a track of applied and unapplied migrations in files and use this data to populate the tables when switching branches.

What do we want?

We are looking for some of the following features in the tool that we are going to use/implement.

  1. Switching branches should be fast
  2. Minimal or no human intervention
  3. No loss of data
  4. Considerate of unapplied migrations (Sometimes, though we have created migrations we might not have applied them)
  5. Automatically manage dependencies between migrations like how Django does
  6. Checking out commits should not break the code

What’s next?

We are going to use some of the solutions that already exist and see if they solve the problem. We believe that a combination of 3rd and 4th solutions might solve our problem (but we will have to combine it ourselves and port them to Python 3.x). Automatic testing will also solve most of the problems as it creates a new DB and populates the necessary data. We are also looking into taking snapshots/differences between the snapshots with each commit and saving them in a file and applying the differences like patches (I have no idea if it can even be done. It might be DB dependent. I might be insane to think that something like this is possible). We are also thinking to save the applied/un-applied migration information with each commit and use this information to migrate the database.

If we ever plan to develop a tool to manage branch changes, and if the community needs it, we are planning to make to open source.

--

--