Zero Downtime Schema Migrations

As part of our Journey to Continuous Deployment at Invoca, one of the roadblocks to more frequent and flexible deploys was database schema migrations. We had to lock users out during a deploys that contained migrations because when users read or wrote to tables that were being migrated they would get errors. While effective at reducing schema incompatibility errors, our goal was to deploy daily and at any time of day. We needed a way to migrate our databases without having a maintenance window.

We solved the migration problem by using, and later extending, a tool for MySQL from Percona called Online Schema Change. Using triggers and atomic table swapping, Online Schema Change enables you to change table schemas without going out of service. We created a ruby wrapper around the tool and extended the Rails database rake tasks, which allowed us to continue using Rails migration files and reduce the amount of developer workflow changes needed for migrations.

Using the Online Schema Change tool to swap out tables atomically was only half the battle. Running such a migration while users are actively logged in and using the system requires careful management of the code using the database:

  • Columns must be added to tables before any of the code that expects them is deployed.
  • Conversely, code that uses deprecated columns must be deleted and deployed before the columns can be dropped.
  • Renames and re-typing are not possible. You must create a new column, copy data over, then drop the old column in a follow on deploy
  • The online migration process is slow for large tables. In some cases, it can take days to migrate a table.

We created a set of safety checks into our CI pipeline to automate the requirements above. They will fail the build and prevent a deploy if they find:

  • Migrations that cannot be supported by online schema changes. i.e. renames and type changes.
  • The existing production code cannot run using the new schema. To verify the compatibility we run the production unit tests against the migrated database. First the production branch is checked out and the sample database is created. Then, the migrations from the deploy branch are run on the sample database. Finally, the production branch’s test suite is run against the migrated sample database. Any failures that occur indicate a schema incompatibility.
  • Dropped columns that are still in use. Columns to be dropped must be annotated as deprecated using an ActiveRecord extension that we built. If the annotated column is accessed by any of the unit tests, the build will fail. If a migration attempts to drop a column that is not marked as deprecated, then the build will fail.
Example of declaring deprecated columns in a Rails model

After running the Online Schema Change tool for about a year we found that, on occasion, the application could not “find” a table that was being migrated. We traced this back to MySQL 5.6’s lack of support for atomic table renames. Support for atomic renames was added in version 5.7 and we were able to update Percona’s script to use the new features. By using this tool and following our strict guidelines around column addition and removal, we are successfully deploying daily without having to lock users out of the system or worry about MySQL schema errors from the application.