Migrations with django-tenants
A comprehensive guide on how migrations with django-tentants work and what we should keep eye on.
Usually, we don’t pay much attention to migrating things in Django. As long as we keep everything simple, running makemigrations
and migrate
commands are more than enough. Sometimes things get a little more complicated and we, developers, need to help the Django do its job. It is usually done with RunPython
operation that allows running some Python code within the migration. While it is pretty straightforward for standard Django migrations, things get more complicated when we use multitenant architecture provided by the django-tenants package.
So here, I will dive into django-tenants migrations — describe how they work, and inspect the most important points about using the RunPython
with tenants. All based on the simple business case.
At the bottom of the article, there is a small glossary that might be useful for you if you are not that familiar with all the phrases. You can also check out the article I co-authored with my mentor about the basic usage of django-tenants and multitenant architecture in general.
Now let’s get started!
The case
Firstly I would like to introduce you to the problem we need to solve and its solution using standard Django migrations. Just for the state of comparison.
Take a look at these simple models:
# companies.modelsclass Company(models.Model):
name = models.CharField(max_length=50)
and
# projects.modelsclass Project(models.Model):
name = models.CharField(max_length=50)
company = models.ForeignKey(
Company,
related_name="projects",
on_delete=models.CASCADE
)
Imagine that the client requirement was to have a nice looking project identifier in the application administration page. Something like <Company name>-<Project name>
. We could simply create a CharField with max_length=101
(as the field will consist of two string of max 50 lengths and one -
char), and unique=True
(as it is supposed to be a unique identifier) to store the data.
class Project(models.Model):
name = models.CharField(max_length=50)
company = models.ForeignKey(
Company,
related_name="projects"
on_delete=models.CASCADE
)
identifier = models.CharField(max_length=101, unique=True)
Now we could run the migrations, but there is a thing to be considered: What about objects already existing in the database?
Trying to migrate such field will result in an error from the database (your message might differ a little depending on the database you are using):
You are trying to add a non-nullable field 'identifier' to project without
a default; we can't do that (the database needs something to populate
existing rows).
Adding a default value will not save us since the second object getting the default value will violate the unique constraint leading to database IntegrityError. Neither will be using null=True
be the solution we are looking for, as already existing objects would end up without the identifier.
The way to go is using RunPython
which is greatly described in Django documentation. Without jumping into details, we would add a piece of Python code into the function in the migration file and pass it as reference to the RunPython
operation. This code would be executed during applying the migration.
We could also pass a function reference as the second argument that would be called when unapplying migrations which can be sometimes useful.
from django.db import migrationsdef forwards_func(apps, schema_editor):
# Do something
passdef reverse_func(apps, schema_editor):
# Reverse what forwards_func did
passclass Migration(migrations.Migration):
dependencies = []
operations = [
migrations.RunPython(forwards_func, reverse_func),
]
So, the naive approach (I will present some optimizations later in the article) would be to iterate over all the Companies projects objects and assign the values in the forward_func
.
for company in Company.objects.all():
# Below we use related_name of the ForeignKey we have
for project in company.projects.all():
project.identifier = f"{company.name}-project.id"
Project.objects.bulk_update(projects, ["identifier"])
Voilà, problem solved. It is pretty straight forward overall.
Tenants setup
Now it is time to jump to the main part.
If you are familiar with tenants you probably know that our initial model’s setup needs to change a little.
We will not have a Foreign key between Project
and Company
models, but rather Company
will become a tenant model (by inheriting from TenantMixin
) - for each company, there will be a separate schema created in the database.
#companies.models.pyclass Company(TenantMixin):
name = models.CharField(max_length=50)
The Project
model will have only the name
field which was already there and newly created identifier
which we will try to migrate.
#projects.models.py
class Project(models.Model):
name = models.CharField(max_length=50)
identifier = models.CharField(max_length=101, unique=True)
Note that we get rid of the ForeignKey. We can do it because, we add the companies
application to SHARED_APPS
in the settings file which means it will be stored in the public
schema. Then, the projects
app lands in TENANTS_APPS
, its objects will be stored in each of the company (tenant) database schema. So no need to create relations here, the data encapsulation will be provided by schemas.
SHARED_APPS = [
...
"companies",
]TENANT_APPS = [
...
"projects"
]
The context in migration
The first thing you need to know is that with the installation of the django-tenants package you are using the migrate-schemas
command which overrides standard Django migrate
.
The purpose is basically the same — to make the database tables reflect the models we have in our Django application. The way it is achieved is slightly different or maybe I should say adjusted.
When running migrate_schemas
we can pass some additional options to this command such as:
--tenant
- to populate only tenant applications--shared
- to populate only shared (public) applications--schema
- to specify the schema we want to migrate--executor
- to specify whether we want to use standard executor or multiprocessing (which can greatly improve the performance in bigger systems)
Based on that information package will identify whether the user wants to migrate a specific type of app (which he might intend by passing options tenant
or shared
options) or both of them (by not passing options at all). It is done by the package by setting the sync_public
and sync_tenant
boolean variables under the hood. Then django-tenants proceed to run migrations separately for tenant
and shared apps
not migrating anything the user doesn't want to.
A list of schemas to be migrated is being built based on the mentioned sync_public
and sync_tenant
variables and the value of the schema
option. Eventually, the list might contain either:
- public schema
- one schema specified by
schema
option - all tenants except public, but it is worth noting that during one migration two separate lists can be built eg. one containing just public schema and second containing all tenants besides the public. Migrations for these lists would be run one after another.
So in general migrate_schemas
calls Django's migrate
in two different ways. First, it calls migrate for the public schema, only syncing the shared apps. Then it runs migrate for every tenant in the database, this time only syncing the tenant apps.
And now the magic happens.
Migrations are run in each of the tenant’s schema separately.
Depending on the executor that is being used the mentioned list is iterated over (for standard executor) or multiple processes (which number can be set) are created. The thing that is same for both executors is that the schema in the database is set similarly to what we would do in code using a context manager schema_context
.
In other words, migration is run in isolation for each company.
Back to code
Below there is a simple migration file generated by running makemigrations
. Notice that I added there an item to the operations
list which tells the Django to run the function I passed by reference in the argument (I will skip implementing the function to perform migration reverse).
From now on we will extend only the provide_identifier()
function as there is no point in copy-pasting the rest of the file.
It is time to make it work step by step.
def provide_identifier(apps, schema_editor):
# Here we will put our code
pass class Migration(migrations.Migration): dependencies = [
('projects', '001_initial'),
] operations = [
migrations.RunPython(provide_unique_ids)
]
Let’s remember what we want to achieve — assign the value f"{company.name}-{project.id}"
to the project.identifier
field. We should start by getting the Company
object to access its field. Keeping in mind that migration is being run in the context of the tenant the standard way of accessing the current tenant is using django.db.connection.tenant
which should return the current tenant object.
from django.db import connection
def provide_identifier(apps, schema_editor):
company = connection.tenant
It should work, but actually, it won’t. To be more precise — not as we expect. During the migration database models can not be imported (at least in a standard way by using import
statement) so the connection.tenant
is an instance of FakeTenant
which wraps the schema_name
in a tenant-like structure.
class FakeTenant:
def __init__(self, schema_name):
self.schema_name = schema_name
Fortunately schema_name
is more than enough to retrieve needed information. We bring our Company
model using the apps.get_model()
method and the query for the specific object.
from django.db.connection import tenant
def provide_identifier(apps, schema_editor):
# First arg of get_model() is application name, second is
# searched Model Company = apps.get_model("companies", "Company")
company = Company.objects.get(schema_name=tenant.schema_name)
Wait a second. Few lines above I told you that during migration we are in the context of the specific tenant which means we search for objects in its schema and now we queried for a model from SHARED_APPS
.
Shouldn’t we switch the context?
In theory, running the query in the wrong schema should raise an exception. But in this case, it will not. It turns out that the magic is done behind the scenes and every time we set a context of a tenant, silently there is also a public
tenant added to the search path. So every time we are in the context of a specific organization we can query for models from public
schema.
Another point worth noting is that simply because we migrate a projects
application (it was the only change in our models) which belongs to TENANT_APPS
we will never get the public
schema in connection.tenant.schema_name
. So we are sure that we will not accidentally try to run the migration in public
schema where there are no tables for Project
model.
Finally, we can finish our little function by querying for all Project
objects and providing the unique_id
in an optimized way (find out more). We avoid the most expensive work which would be asking the database multiple times for the specific name of a Project model in the queryset and saving result. Instead, it is done on the database level.
from django.db.connection import tenantdef provide_identifier(apps, schema_editor):
Company = apps.get_model("companies", "Company")
Project = apps.get_model("projects", "Project")
company = Company.objects.get(schema_name=tenant.schema_name)
projects = Project.objects.all()
projects.update(unique_id=Concat(
Value(f"project{company.name}-"),
F('name'))
That’s it. After the migration is performed. Not only will we have a new field in the model, but also the values populated for each project.
To remember
- We can use
RunPython
to edit the data during migrations - We can also specify the function called when unapplying migration
- We can pass some useful options to
migrate
command - Migrations are run in the context of each tenant (from the built list of tenants to migrate)
- During migration we
connection.tenant
doesn't consist actual tenant object, but a FakeTenant - There is a
public
schema added to the search path by setting a database context - We can nicely optimize operations during
RunPython
Glossary
- tenant — basically a Django model that inherits from
TenantMixin
, in our case this isCompany
- schema_name — unique identifier of a tenant
- public schema — database schema to contain data available application-wide
- tenant schema — database schema containing data for specific tenant (Company)
- SHARED_APPS — a list of Django applications which models should be stored in the public schema
- TENANT_APPS — a list of Django applications whose models are stored in specific schemas corresponding to Company objects.
P.S.
I am aware the problem, or at least part of it, could be solved using the model@property
, but then there would be no article. :)