What Should a Migrator do Redux

In my last post I discussed the problems that occur during development when working on migrations. I proposed a design which at each step was in response to the issues that showed up. I want to propose an alternative design which is simpler.

Prod and Dev and Different

We are going to make the difference between prod and dev databases explicit. In production we will have the following migrations table:

CREATE TABLE migrations
( version TIMESTAMP PRIMARY KEY
, create TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT clock_timestamp()
);

In staging and for dev databases we will have the following migrations table:

CREATE TABLE migrations
( version TIMESTAMP PRIMARY KEY
, create TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT clock_timestamp()
, hash text
);

During development we can use the hash column to know when to rollback migrations as I discussed in the prior post. The hash will be NULL if the migration has already been applied in production.

So far this is very similar to designs I have already discussed.

No Looking Back

The major difference with this design is how it deals with previously applied migrations. Instead of leaving a list of applied migrations in source control, after each deploy the list of migrations is replaced with a single migration which is a schema dump of the current state of the production database. We are going “rebaseline” after each deploy.

This solves the problem of having to update our migrations because of bit rot. Past migrations are preserved in the source control history, but only a list of yet to be deployed migrations are stored in HEAD at any particular time.

This also solves the inconsistent ordering problem, because we only maintain a list of future migrations where the order is unambiguous.

The main downside is that we need to update source control based on the state of prod after each deploy, which is not a common workflow. Another downside is we will need to automate taking logical backups of staging and recreating staging using the newly created schema dump after each deploy.

The value in this approach is it vastly reduces the possibility of drift. Since we are using an up to date schema dump of prod, dev machines will begin in the same state. Additionally, the migration tables are simpler and easier to understand.