Breaking Free From the ORM: Decoupling Database Migrations
Defining expectations from a database migration system without an ORM
TL;DR, Removing the database lifecycle from your application lifecycle will reduce the complexity in the system.
In our previous article, we explored why the ORM is a detrimental construct. When we seek to reduce complexity by removing it, we need to replace three competencies bundled into one:
- Database migrations
- Database connection handling
- Database queries
Requirements for the database migration system
- Native scripting: Databases have a rich set of features accessible by their scripting language. There should be no need to learn a different DSL (Domain-specific Language) to make changes to our database.
- Proper order of execution: The system should enforce the application of database changes in the same predetermined order every time these changes run.
- Rollback: We always think about database migrations as a means to evolve the database, but sometimes we screw up and need to “rollback” to a previous version. We need this functionality in our database migration system.
- No dependencies on the application: ORMs are usually embedded within the language or framework that we use when writing our back-end applications (e.g., Django, Ruby on Rails, etc..). For our database migration system, we want to dissolve the bindings between the tools we use to write the application logic and the tools we use to manage our migrations. This allows us to manage the database lifecycle independently of our application lifecycle. Then we’ll be able to deploy database changes without having to deploy the entire application.
- Database changes should be in a single transaction: Either the entire migration succeeds or it fails but not something in between.
- Idempotency: We can run the migration command as many times as we want but changes to the database will only be applied once.
Tools that satisfy our database migration system requirements
There are some tools out there that satisfy all of the requirements. One that seems very popular, according to the number of docker images downloads, is Flyway. I personally have successfully used sqitch for the last couple of years.
While this is not a comprehensive tutorial on how to use any of the tools, for the sake of completing the argument here an example of a typical sqitch workflow. Once we have sqitch configured, imagine our first migration is named
# This will create a new migration named “users_table” that depends on a previous migration named “initial_migration”sqitch add users_table -r initial_migration -n “Create users table”
The previous command will create 3 files, a file under the
deploy folder, a file under the
revert folder and one under the
verify folder (we’ll ignore this last one).
Now we need to add a PostgreSQL deploy script:
deploy/users_table.sqlBEGIN;# Needed to generate UUIDs
CREATE EXTENSION IF NOT EXISTS pgcrypto;CREATE TABLE "users" (
"uuid" uuid DEFAULT gen_random_uuid() NOT NULL PRIMARY KEY,
"created" TIMESTAMP WITH TIME ZONE NOT NULL default NOW(),
"email" TEXT NOT NULL,
"password" TEXT NOT NULL,
"deleted" BOOLEAN NOT NULL DEFAULT false
);CREATE UNIQUE INDEX user_email_for_non_deleted ON "user"("email") WHERE (deleted is FALSE);COMMIT;
Now we need to add a PostgreSQL revert script:
revert/users_table.sqlBEGIN;DROP EXTENSION IF EXISTS pgcrypto;
DROP TABLE IF EXISTS "users";COMMIT;
Then to deploy the changes to an already configured database we run:
# deploys pending migrations to production_db (already configured)sqitch deploy -t production_db
And if we want to revert the change we just need to:
sqitch revert -t production_db --to-change initial_migration
The last command will run all the revert scripts until and without including
initial_migration. In our case, it will just run
Should we decouple database migrations from application deployments?
With ORMs when we deploy a database migration it usually occurs during an application deployment, giving a false illusion that both the database change and the code update happen at the same time.
Well, bad news, they happen sequentially, which means that there is a period of time in which the database has the new changes, but the application doesn’t have the new code.
The best case scenario, the one that we always have in mind, is this delay in time is just a couple of seconds.
In the worse case scenario, the database is updated, the application deployment fails for whatever reason, we don’t have the proper database rollback strategy and the new database has some breaking changes to the currently deployed application. Then our application remains broken until we rollback the database changes or we fix the deployed application.
It is better to isolate the deployment of database changes from the deployment of application changes to make it very explicit to everyone that database migrations and application deployments don’t happen at the same time. Then we will always remember to write non-breaking changes when creating migration scripts.
The 2 main benefits to decoupling database changes from application changes are:
- It’s explicit that both don’t happen at the same time, which makes every developer writing database migrations very aware of it.
- By decoupling the application lifecycle from the database lifecycle, the database schema will probably be managed and modified by fewer and more database-savvy developers than if everything is coupled and managed by the same group of full stack developers.
How to write non-breaking database migrations
When in doubt, ask Rich Hickey.
In his 2016 “Spec-ulation keynote” Rich addresses how to write non-breaking code in general. In particular, the implementation of non-breaking database migrations would be:
- Accretion is good. Adding new tables, columns, … are non-breaking changes.
- Deletion is bad. Only remove tables, columns, … when no other part of the system is using it. Rich Hickey will say never to delete anything.
- Updates are bad and tricky. When updating a column name, table name, … treat it as multiple steps in which we add the new desired element (keeping the old and new one), we port every other part of the system to use the new one and then we remove the old one. Rich Hickey would say to just alias the old construct and keep both.
Building the system
We’ve previously explained why ORMs are detrimental. In this article, we have explored why and how to decouple the database migration system from the ORM.
By using an ORM for database migrations, and migrating the database/deploying application updates “simultaneously” we are making our system easier to manage in straightforward cases but more difficult to maintain in the long run, especially when something fails.
This difficulty comes from multiple types of changes (database and application updates) happening in what’s erroneously thought of at the same time.
By decoupling the database migration system from the ORM and the application lifecycle, we make very explicit that database migrations and code updates don’t happen at the same time. So it becomes very evident that we shouldn’t introduce database breaking changes, which in turn makes the overall system stronger.