Alembic Migrations Without Downtime

Published in

Exness Tech Blog

7 min readMar 17, 2023

Alembic is a powerful and flexible database migration tool that you can use with a Python app. But if you don’t follow some basic best practices when using Alembic, things can go wrong very quickly and database migrations may lead to downtime of your application. In this article, we will cover some tips to keep your Alembic migrations safe. We’ll use Alembic with PostgreSQL in examples, and some tips and configurations provided below are specific to PostgreSQL.

Setting Alembic

Making migrations to production can be a challenging operation as it can lead to application downtime, especially if the migration involves a long-running operation that locks tables. Try to avoid these. If the migration involves a long-running operation that locks tables, consider performing the migration in stages by breaking it into smaller, more manageable steps. This can help minimize the impact on the application and reduce the risk of downtime.

Use transaction_per_migration=True to make migrations independent of each other. Transactions will become smaller and lighter, with less chance of long-term table locks.

# env.py
context.configure(
    connection=connection,
    target_metadata=target_metadata,
    transaction_per_migration=True
)

Add timeouts for migrations. This limits migrations from blocking user requests by waiting for too long or running expensive queries.
PostgreSQL has two configuration options:

lock_timeout: this configuration parameter sets a timeout for acquiring locks on database objects. If a lock cannot be acquired within the specified timeout, an error will be raised. Useful for limiting the time that a migration holds a lock on a table or index.
statement_timeout: this configuration parameter sets a timeout for individual SQL statements. If a statement takes longer than the specified timeout, it will be canceled and an error will be raised. Useful for limiting the time that a migration spends on a particular query.

By setting these timeouts, you can ensure that your Alembic migrations don’t hold up user requests for too long and that they don’t run expensive queries or hold locks on database objects for extended periods of time. This can help keep your application responsive and minimize the risk of downtime caused by migrations.

Example of an alembic configuration:

# env.py 

def run_migrations_online() -> None:
    """Run migrations in 'online' mode.

    In this scenario we need to create an Engine
    and associate a connection with the context.

    """
    connectable = engine_from_config(
        config.get_section(config.config_ini_section),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
        connect_args={
          "options": "-c lock_timeout=4000 -c statement_timeout=5000"
        }
    )

    with connectable.connect() as connection:
        context.configure(
            connection=connection,
            target_metadata=target_metadata,
            transaction_per_migration=True
        )

        with context.begin_transaction():
            context.run_migrations()

Run sanity checks on your migrations

It is useful to check that Alembic has only one head when working on a project with multiple developers. Having multiple heads can lead to conflicts and errors in the migration process. It can be checked with pytest (displayed below) if you run tests in CI, or you can add some script for using in CI.

Here’s an example of how to check for multiple Alembic heads using pytest:

from alembic.config import Config
from alembic.script import ScriptDirectory

def test_only_single_head_revision_in_migrations():
    config = Config()
    config.set_main_option("script_location", "app:migrations")
    script = ScriptDirectory.from_config(config)

    # This will raise if there are multiple heads
    script.get_current_head()

You can use alembic merge to merge two revisions together if you detect multiple heads.
Also use alembic check to check migrations' relevance. This command checks if any newly made database changes are not reflected in the migrations, and if any existing migrations are no longer needed. However, it is important to note that the alembic check and alembic autogeneratefeatures may not detect all changes accurately, especially for complex changes such as database schema refactorings. Refer to this documentation for more on autogenerate migrations.

Avoid migrations that cause table locks

Creating Alembic migrations for PostgreSQL can be difficult because PostgreSQL uses various locking mechanisms to prevent concurrent access to shared resources such as tables and indexes. If a migration operation acquires a lock on a shared resource for an extended period of time, it can prevent other operations from accessing or modifying that resource, leading to downtime for the application.

For example, when you make changes to a table’s schema, like adding a non-nullable field, a field with an index, or a unique field, or adding a constraint, PostgreSQL will acquire an ACCESS EXCLUSIVE lock on the table. This lock prevents any other transactions from accessing the table while the schema change is being made, as frequent accesses or prolonged schema changes can cause performance issues.

When you add a foreign key constraint to a table, PostgreSQL will acquire a SHARE ROW EXCLUSIVE lock on both the referenced and the referencing tables. This lock ensures that no other transactions can modify either table while the foreign key constraint is being added. Since adding a foreign key constraint requires a table scan, the lock can potentially cause performance issues if the tables are large or if the foreign key constraint involves complex relationships.

Use squawk to check migrations and avoid or reduce the impact of table locks during schema changes.

Install

npm install -g squawk-cli

Usage

alembic upgrade head --sql | squawk

or you can choose revision

alembic upgrade first_revision:last_revision --sql | squawk

Adding a step to your continuous integration (CI) pipeline to validate migrations can help catch any errors or issues with your Alembic migrations before deploying to production. See the documentation for examples of fixing Alembic migrations.

Keep your migrations backward compatible

Migrations are called backward compatible when the migration changes the database in such a way that an older code version can still use that database.

In other words, a backward compatible migration ensures that the changes made to the database schema do not break the existing codebase, and that the system can continue to operate smoothly even after the migration is applied. This is important to maintain the stability and reliability of the software system over time, especially in large and complex applications where frequent updates and changes are necessary.

Adding new fields

Adding a NOT NULL field to a database table can potentially cause errors in some cases:

If you run migration during deployment to production, any application instances still using the old code without the non-nullable field in the model may encounter errors — namely when trying to insert or update table data that requires the new field
If you roll back to a previous code version without a non-nullable field in the model, but do not downgrade the migration that added the field, you may encounter errors when trying to insert or update table data that requires the new field
If you deploy a new code version with the new field before running the migration, you may encounter errors when trying to insert or update table data without the new field

It’s worth noting that adding a NOT NULL field without default value to an existing table cause errors if the table has rows.
To avoid errors with the new field, you should use server_default. It specifies a default value for the new field in case the field is not explicitly set when a new row is inserted into the table. This can help avoid errors that might occur if the new field is left blank or null. Alternatively, you can add a NOT NULL field in 5 steps and 2 deploy to production.

Add a field with nullable=True and run the migration on production
Deploy code to production
Fill rows with values if you need to
Change nullable=True to nullable=False and run the migration on production
Deploy code to production

Deleting old fields

There are some scenarios where errors can occur when running a migration that deletes a field from a database table.
A few examples:

If you run migration during deployment to production, any application instances still using the old code version with the deleted field may encounter errors when trying to query or modify data in a table without the field.
If you roll back to a previous code version that still uses the deleted field, but do not downgrade the migration that deleted the field, you may encounter errors when trying to query or modify data in a table without the field.
If you deploy a new code version without a NOT NULL field in a model, but do not run the migration that deletes the field before deploying the code, you may encounter errors when trying to insert or update data in a table that requires the deleted field.

When you delete a field from a database table and run a migration on production, the previous code version should not query the field from the database or try to modify a value into the deleted field.
To avoid errors, you should deprecate the field in the code. Change nullable=False to nullable=True. Use deferred() to stop querying the field from DB. Use evaluates_none() to stop inserting null for the field. SQLAlchemy doesn’t insert null if you don’t pass the explicit value of None for this field. We will not insert a value in this field in DB because we will not use this field in the code and will set nullable=True.

Delete the usage of the field in the code and deprecate the field in the model

some_field = deferred(
  sa.Column(sa.BigInteger().evaluates_none(), nullable=True)
)

2. Create and run the migration on production if you set nullable=True

3. Deploy code with the deprecated field to production

4. Delete the field in the model, create and run the migration on production

5. Deploy code with the deleted field to production

Renaming fields

Avoid it if possible. If you need to rename a field anyway, use the previous advice.

Create a new field
Migrate data from the old field to the new field
Delete the old field

Conclusion

If you want safe database migrations with Alembic, there are several tips you can follow, including using a linter for your migrations, configuring Alembic correctly, and making sure your migrations are backward compatible.

Using a linter for your migrations can help catch potential issues and ensure that your migrations are following best practices. Use squawk or other linters.

Configuring Alembic correctly is also important for safe migrations. This includes setting up the correct database connection settings in your Alembic configuration file.

Making your migrations backward compatible is also crucial for minimizing downtime during the migration process. This means ensuring that your migrations can be applied and rolled back in any order and that they don’t break any existing functionality or data in your database.

By following these tips, you can ensure that your Alembic migrations are safe and reliable and that they can be run in production without causing downtime or other issues with your application.