Testing database migrations

Published in

Ingeniously Simple

7 min readSep 18, 2020

Database migrations help to keep database schema versioned, by documenting database changes in source control. But how should you test if those migrations are correct and will work against production consistently? In this article we are going to explain how we test database migrations using Flyway as our database migration tool and Spawn as our database provisioner.

Spawn is a cloud service provided by Redgate Foundry. It allows you to create ephemeral databases on demand that can be used for both development and for testing in CI pipelines.

Spawn components:

Spawn uses 2 components to work, and these should be familiar to anyone who has worked with Docker before.

Data Image: the fundamental component that contains the necessary information to start a data container.

Data Container: created from the instructions described in the Data Image. It’s a running instance of your data-image; i.e. a database server.

Example of Azure Devops CI pipeline that uses Spawn to test database migrations

Test migration against an empty schema

The simplest way for us to get started is to test database migration scripts in CI pipelines on an empty database. This is achieved using the following steps.

Install Spawn.
Create an empty spawn data-image.
Create spawn data-container from image.
Run flyway migrations against that data-container.
Delete the data-container.

This allows us to consistently test that new migrations work against the previous schema. It also allows us to catch simple issues sooner. In general, these issues arise when multiple people work on the same project, e.g. two people creating the same versioned migration and having migrations running in the wrong order as a result.

Setup spawn

Spawn is managed using spawnctl, a command-line tool. To manage spawn resources in CI pipelines (non-interactive environments), the spawn service needs to be authenticated using an access-token.

spawnctl create access-token --purpose <Purpose>

Create spawn data-image

We create an empty data-image from a yaml source file and tag it empty. This data-image is created once, and it is shared by all the steps of the CI pipeline.

spawnctl create data-image -f demo-empty.yaml --accessToken <access-token-string>

With the content of demo-empty.yaml file being:

name: demo-empty
sourceType: empty
engine: mssql
initialDatabaseName: Customers
tags:
- empty

Test database migration using spawn data-container

Now that everything is set up, we can spin up an ephemeral database from the empty spawn data-image; The data-image created in the previous step already has the snapshot of the empty database, so now we can run our migrations against that database.

To create a data-container from the data-image that was created in previous step run this script:

spawnctl create data-container -i demo-empty --accessToken <access-token-string>

That command will give us back the connection details necessary for us to connect to that database and run our flyway migrations against.

Test database migration in CI pipeline

dataContainer=$(spawnctl create data-container -i demo-empty -q)json=$(spawnctl get data-container $dataContainer -o json)port=$(echo $json | jq -r .port)
host=$(echo $json | jq -r .host)
user=$(echo $json | jq -r .user)
password=$(echo $json | jq -r .password)docker run -v $PWD/database/sql:/flyway/sql \
    flyway/flyway:6 \
    migrate \
    -url=jdbc:jtds:sqlserver://$host:$port/Customers \
    -user=$user \
    -password=$passwordFlyway Community Edition 6.5.5 by Redgate
Database: jdbc:jtds:sqlserver://instances.spawn.cc:33333/Customers (Microsoft SQL Server 14.0)
Successfully validated 6 migrations (execution time 00:00.170s)
Creating Schema History table [Customers].[dbo]
[flyway_schema_history] …
Current version of schema [dbo]: << Empty Schema >>
Migrating schema [dbo] to version 1.0.001 — …
Migrating schema [dbo] to version 1.0.006 — …
Successfully applied 6 migrations to schema [dbo] (execution time 00:01.811s)

After all the migrations tests are complete, delete the data-container created for the test using its name

spawnctl delete data-container <container-name> --accessToken <access-token-string>

Running database migration tests against an empty database has some limitations, as this does not represent the production database. Some examples of this scenario are

Creating unique constraint in a column where production data might have duplicate data
Changing datatype of a column from varchar to int, where production data might have some text

In the next section we’ll explain how to run database migration tests against a production database.

Test migration against production data

Testing database migrations against an empty database was not good enough for us, as occasionally database migrations would pass CI tests but fail when the migration was applied to the production database. This was because the CI tests were using an empty database, not the production one. Even though both databases had the same schema, migration scripts failed because of the data in the production database. We can leverage Spawn to create an ephemeral database copy of production in our CI tests.

This helps us to catch issues early in our CI database migration tests that might occur during production rollout.

This results in the following:

Have a scheduled backup of the production database.
Create a data-image from a backup file.
Create a data-container from the latest data-image.
Run migrations tests against that newly created data-container.

Set up the CI pipeline

When testing against databases that hold more complex data it is necessary to have the right CI pipeline in place. In this case it means:

Having a backup of your production database
Creating a data-image from that backup

Now instead of using an empty database, as part of our build pipeline we run a scheduled job to take a backup of the production database and recreate a data-image from that backup every weekday at 3:00AM.

Create spawn data-image

Spawn has an option to create a data-image from a backup; data-images created using a backup will be a database snapshot with the data from the backup.

With the content of demo-backup.yaml file being:

name: demo-prod
sourceType: backup
engine: mssql
tags:
- prod
- latest
backupFolder: ./backup
backupFile: database.bak

And you can create the data-image like so

spawnctl create data-image -f demo-production.yaml

Test migrations using data-containers with production data

To test database migration scripts against a production database we need to provision a database with the latest production data and run the migration test on it. The data-image created in the previous step has the snapshot of the production database.

We start by provisioning a copy of the production database by creating a data-container from the data-image created using production backup. Typically, this step takes seconds. Mainly because the backup was restored when the data-image was created.

Run the flyway migration on the data-container and after the test is complete, clean-up the spawn resources by deleting the data-container created for this test.

The pipeline script looks something like this:

dataContainer=$(spawnctl create data-container -i demo-prod -q)json=$(spawnctl get data-container $dataContainer -o json)port=$(echo $json | jq -r .port)
host=$(echo $json | jq -r .host)
user=$(echo $json | jq -r .user)
password=$(echo $json | jq -r .password)docker run -v $PWD/database/sql:/flyway/sql \
    flyway/flyway:6 \
    migrate \
    -url=jdbc:jtds:sqlserver://$host:$port/Customers \
    -user=$user \
    -password=$password

Why not use Docker?

This is a very sensible question. Let’s see when it makes sense to use docker and when you start to see the limitations.

For testing database migrations against an empty database, you can use docker instead. You’ll have to have the docker engine present on the build agents and also some logic in order to wait for the database containers to be ready to use.

If the docker images are not present on the build agents they will get downloaded every time too.

For testing database migration against a copy of production, you will have to set up volumes and manage them as part of your pipeline. Something that can be quite daunting. This is where you really see the power of something like spawn, that creates an abstraction around those details that you don’t care too much about. Additionally, with spawn there is no need for extra infrastructure to run databases in your CI pipelines; that is also abstracted away.

spawn and docker comparison for testing database migrations in CI pipelines

Conclusion

We’ve shown how you can use technologies like Flyway and Spawn to provision ephemeral databases through your CI pipelines in order to test your database migrations. We test migrations using both empty databases and those with production data.

Testing against an empty database helps us to identify any error in the migration scripts which may fail due to errors in the scripts themselves.

Testing with production data helps us to capture any issue that might occur during production rollout.

Our database migration tests have led to the following improvements overall:

Test runs are fast as spawn provisions isolated ephemeral databases in seconds; these databases can be created per build or per build step/ stage.
No additional infrastructure is required to host test databases, which means the tests can run on any build agents (on-premise or cloud), this makes more build agents available for tests to run.

How are you testing your database migrations in your CI systems, is this enough for you? What other things do you check when testing migrations?

Join beta!

We are recruiting users for the Spawn beta program. Sign up here!

Testing database migrations

Spawn components:

Test migration against an empty schema

Setup spawn

Create spawn data-image

Test database migration using spawn data-container

Test database migration in CI pipeline

Test migration against production data

Set up the CI pipeline

Create spawn data-image

Test migrations using data-containers with production data

Why not use Docker?

Conclusion

Join beta!

Written by santiago arias