Migrations with django-tenants

A comprehensive guide on how migrations with django-tentants work and what we should keep eye on.

Published in

thirty3

8 min readOct 9, 2020

Usually, we don’t pay much attention to migrating things in Django. As long as we keep everything simple, running makemigrations and migrate commands are more than enough. Sometimes things get a little more complicated and we, developers, need to help the Django do its job. It is usually done with RunPython operation that allows running some Python code within the migration. While it is pretty straightforward for standard Django migrations, things get more complicated when we use multitenant architecture provided by the django-tenants package.

So here, I will dive into django-tenants migrations — describe how they work, and inspect the most important points about using the RunPython with tenants. All based on the simple business case.

At the bottom of the article, there is a small glossary that might be useful for you if you are not that familiar with all the phrases. You can also check out the article I co-authored with my mentor about the basic usage of django-tenants and multitenant architecture in general.

Now let’s get started!

The case

Firstly I would like to introduce you to the problem we need to solve and its solution using standard Django migrations. Just for the state of comparison.

Take a look at these simple models:

# companies.modelsclass Company(models.Model):
    name = models.CharField(max_length=50)

and

# projects.modelsclass Project(models.Model):
    name = models.CharField(max_length=50)
    company = models.ForeignKey(
        Company, 
        related_name="projects",
        on_delete=models.CASCADE
        )

Imagine that the client requirement was to have a nice looking project identifier in the application administration page. Something like <Company name>-<Project name>. We could simply create a CharField with max_length=101 (as the field will consist of two string of max 50 lengths and one - char), and unique=True (as it is supposed to be a unique identifier) to store the data.

class Project(models.Model):
    name = models.CharField(max_length=50)
    company = models.ForeignKey(
        Company, 
        related_name="projects"
        on_delete=models.CASCADE
        )
    identifier = models.CharField(max_length=101, unique=True)

Now we could run the migrations, but there is a thing to be considered: What about objects already existing in the database?

Trying to migrate such field will result in an error from the database (your message might differ a little depending on the database you are using):

You are trying to add a non-nullable field 'identifier' to project without 
a default; we can't do that (the database needs something to populate 
existing rows).

Adding a default value will not save us since the second object getting the default value will violate the unique constraint leading to database IntegrityError. Neither will be using null=True be the solution we are looking for, as already existing objects would end up without the identifier.

The way to go is using RunPython which is greatly described in Django documentation. Without jumping into details, we would add a piece of Python code into the function in the migration file and pass it as reference to the RunPython operation. This code would be executed during applying the migration.

We could also pass a function reference as the second argument that would be called when unapplying migrations which can be sometimes useful.

from django.db import migrationsdef forwards_func(apps, schema_editor):
    # Do something
    passdef reverse_func(apps, schema_editor):
    # Reverse what forwards_func did
    passclass Migration(migrations.Migration):

    dependencies = []

    operations = [
        migrations.RunPython(forwards_func, reverse_func),
    ]

So, the naive approach (I will present some optimizations later in the article) would be to iterate over all the Companies projects objects and assign the values in the forward_func .

for company in Company.objects.all():
    # Below we use related_name of the ForeignKey we have
    for project in company.projects.all():  
        project.identifier = f"{company.name}-project.id"
    Project.objects.bulk_update(projects, ["identifier"])

Voilà, problem solved. It is pretty straight forward overall.

Tenants setup

Now it is time to jump to the main part.

If you are familiar with tenants you probably know that our initial model’s setup needs to change a little.

We will not have a Foreign key between Project and Company models, but rather Company will become a tenant model (by inheriting from TenantMixin) - for each company, there will be a separate schema created in the database.

#companies.models.pyclass Company(TenantMixin):
    name = models.CharField(max_length=50)

The Project model will have only the name field which was already there and newly created identifier which we will try to migrate.

#projects.models.py
 
class Project(models.Model):
    name = models.CharField(max_length=50)
    identifier = models.CharField(max_length=101, unique=True)

Note that we get rid of the ForeignKey. We can do it because, we add the companies application to SHARED_APPS in the settings file which means it will be stored in the public schema. Then, the projects app lands in TENANTS_APPS, its objects will be stored in each of the company (tenant) database schema. So no need to create relations here, the data encapsulation will be provided by schemas.

SHARED_APPS = [
...
"companies",
]TENANT_APPS = [
...
"projects"
]

The context in migration

The first thing you need to know is that with the installation of the django-tenants package you are using the migrate-schemas command which overrides standard Django migrate.

The purpose is basically the same — to make the database tables reflect the models we have in our Django application. The way it is achieved is slightly different or maybe I should say adjusted.

When running migrate_schemas we can pass some additional options to this command such as:

--tenant - to populate only tenant applications
--shared - to populate only shared (public) applications
--schema - to specify the schema we want to migrate
--executor - to specify whether we want to use standard executor or multiprocessing (which can greatly improve the performance in bigger systems)

Based on that information package will identify whether the user wants to migrate a specific type of app (which he might intend by passing options tenant or shared options) or both of them (by not passing options at all). It is done by the package by setting the sync_public and sync_tenant boolean variables under the hood. Then django-tenants proceed to run migrations separately for tenant and shared apps not migrating anything the user doesn't want to.

A list of schemas to be migrated is being built based on the mentioned sync_public and sync_tenant variables and the value of the schema option. Eventually, the list might contain either:

public schema
one schema specified by schema option
all tenants except public, but it is worth noting that during one migration two separate lists can be built eg. one containing just public schema and second containing all tenants besides the public. Migrations for these lists would be run one after another.

So in general migrate_schemas calls Django's migrate in two different ways. First, it calls migrate for the public schema, only syncing the shared apps. Then it runs migrate for every tenant in the database, this time only syncing the tenant apps.

And now the magic happens.

Migrations are run in each of the tenant’s schema separately.

Depending on the executor that is being used the mentioned list is iterated over (for standard executor) or multiple processes (which number can be set) are created. The thing that is same for both executors is that the schema in the database is set similarly to what we would do in code using a context manager schema_context.

In other words, migration is run in isolation for each company.

Back to code

Below there is a simple migration file generated by running makemigrations. Notice that I added there an item to the operations list which tells the Django to run the function I passed by reference in the argument (I will skip implementing the function to perform migration reverse).

From now on we will extend only the provide_identifier() function as there is no point in copy-pasting the rest of the file.

It is time to make it work step by step.

def provide_identifier(apps, schema_editor):
    # Here we will put our code
    pass  class Migration(migrations.Migration):      dependencies = [  
        ('projects', '001_initial'),  
	]      operations = [  
        migrations.RunPython(provide_unique_ids)  
    ]

Let’s remember what we want to achieve — assign the value f"{company.name}-{project.id}" to the project.identifier field. We should start by getting the Company object to access its field. Keeping in mind that migration is being run in the context of the tenant the standard way of accessing the current tenant is using django.db.connection.tenant which should return the current tenant object.

from django.db import connection
     def provide_identifier(apps, schema_editor):
         company = connection.tenant

It should work, but actually, it won’t. To be more precise — not as we expect. During the migration database models can not be imported (at least in a standard way by using import statement) so the connection.tenant is an instance of FakeTenant which wraps the schema_name in a tenant-like structure.

class FakeTenant:
    def __init__(self, schema_name):
        self.schema_name = schema_name

Fortunately schema_name is more than enough to retrieve needed information. We bring our Company model using the apps.get_model() method and the query for the specific object.

from django.db.connection import tenant
    
    def provide_identifier(apps, schema_editor):
       # First arg of get_model() is application name, second is
       # searched Model                Company = apps.get_model("companies",  "Company")
       company = Company.objects.get(schema_name=tenant.schema_name)

Wait a second. Few lines above I told you that during migration we are in the context of the specific tenant which means we search for objects in its schema and now we queried for a model from SHARED_APPS.

Shouldn’t we switch the context?

In theory, running the query in the wrong schema should raise an exception. But in this case, it will not. It turns out that the magic is done behind the scenes and every time we set a context of a tenant, silently there is also a public tenant added to the search path. So every time we are in the context of a specific organization we can query for models from public schema.

Another point worth noting is that simply because we migrate a projects application (it was the only change in our models) which belongs to TENANT_APPS we will never get the public schema in connection.tenant.schema_name. So we are sure that we will not accidentally try to run the migration in public schema where there are no tables for Project model.

Finally, we can finish our little function by querying for all Project objects and providing the unique_id in an optimized way (find out more). We avoid the most expensive work which would be asking the database multiple times for the specific name of a Project model in the queryset and saving result. Instead, it is done on the database level.

from django.db.connection import tenantdef provide_identifier(apps, schema_editor):
    Company = apps.get_model("companies",  "Company")
    Project = apps.get_model("projects", "Project")
    
    company = Company.objects.get(schema_name=tenant.schema_name)
    projects = Project.objects.all()
    projects.update(unique_id=Concat(
        Value(f"project{company.name}-"), 
        F('name'))

That’s it. After the migration is performed. Not only will we have a new field in the model, but also the values populated for each project.

To remember

We can use RunPython to edit the data during migrations
We can also specify the function called when unapplying migration
We can pass some useful options to migrate command
Migrations are run in the context of each tenant (from the built list of tenants to migrate)
During migration we connection.tenant doesn't consist actual tenant object, but a FakeTenant
There is a public schema added to the search path by setting a database context
We can nicely optimize operations during RunPython

Glossary

tenant — basically a Django model that inherits from TenantMixin, in our case this is Company
schema_name — unique identifier of a tenant
public schema — database schema to contain data available application-wide
tenant schema — database schema containing data for specific tenant (Company)
SHARED_APPS — a list of Django applications which models should be stored in the public schema
TENANT_APPS — a list of Django applications whose models are stored in specific schemas corresponding to Company objects.

P.S.
I am aware the problem, or at least part of it, could be solved using the model @property, but then there would be no article. :)