Start Treating Your SQL as First-Class Code with Idempotent SQL DDL

Safely rerun SQL DDL against an existing schema, quickly see historical changes in SQL resources in Git, and treat SQL DDL as first-class code.

The Opinionated TLDR

  • All SQL DDL must be idempotent. Only use migrations tools that support or can generate idempotent SQL DDL.
  • Checkout sql-watch, a command line utility that enables rapid SQL development by automatically applying idempotent SQL scripts to a PostgreSQL database on file change.

Advantages of Idempotent Designs

I greatly favor idempotent-centric solutions in software engineering. See Idempotent Focused Architecting for my thoughts around idempotence in software engineering.

This Article Is Not About

  • SQL migration tools or recommendations on a migration tool
  • Other database types like key/values stores
  • Schema designing
  • Writing idempotent DDL for a given relational database provider (the examples use PostgreSQL)
  • The advantage and disadvantages of handwritten SQL DDL versus using ORMs to generate SQL DDL

Prerequisites and Recommendations

  • Your SQL provider must support transactional SQL DDL. If there is an error in the DDL, then the server must be able to roll back that error.
  • Using a migration tool is recommended but not required when running idempotent SQL DDL.
  • Choose a strongly typed relational database: this is very useful in catching developer mistakes when creating idempotent SQL DDL.

SQL DDL — A Quick Introduction

The SQL Data Definition Language (DDL) is used to create, update, and delete SQL resources like tables, views, stored procedures, functions, etc. The following PostgreSQL DDL creates a schema named iso and a table named iso639_1.

-- File 01292022145902_iso_schema/up.sql
-- Creating a schema
CREATE SCHEMA iso;
-- Creating a table
CREATE TABLE iso.iso639_1 (
alpha2_id iso.alpha2 NOT NULL PRIMARY KEY,
label shared.label NOT NULL
);

The script is not idempotent. In PostgreSQL, running the script the first time results in the following output:

CREATE SCHEMA
CREATE TABLE

Rerunning the script results in an error:

ERROR:  schema "iso2" already exists

We want a way to rerun the SQL script without error.

Database Servers Are Always On

Relational database servers are an interesting beast because they are always on. Unlike a program that can update through rolling updates or restart, changes to a relational database can only happen while the database is running. Further, there is an expectation that a database server is always available in production environments.

Schema Migration Script: A History

It is possible to connect to a running instance of a database server and run SQL DDL directly. However, manually making changes to a running system, especially in production, is frowned upon for reasons not covered in this article. Available to us is a tool to support safely updating “always-on” database servers: schema migration scripts.

Schema migration refers to the management of incremental, reversible changes and version control to relation databse schemas. — wikipedia.org: Schema Migration

Instead of manually running SQL DDL on database servers, we can use Schema Migration. An example migration tool would:

  • Have a naming convention for directories and database schema files.
  • The sort order of file names determines execution order.
  • Require two script files [1]: an “up” file to create, update or delete resources and a “down” file to undo the creation, updating, or deletion.
  • Only runs those scripts that have not run on the database server already.

Continuing with the above example, we create a file 01292022145902_iso_schema/down.sql.

-- File named 01292022145902_iso_schema/down.sql
-- Undo what the up script did
DROP TABLE iso.iso639_1
DROP SCHEMA iso;

During migration, if file 01292022145902_iso_schema/up.sql has not run, then that script is run: a 2nd run ignores the file. The migration tool can undo changes by running the 01292022145902_iso_schema/down.sql.

The Problems With Run Once Migration Scripts

Migration scripts that run one time against a database instance have significant disadvantages:

Resources Defined Across Multiple Files

The table iso.iso639_1 needs a description column added by creating two new migration files:

-- File 03032022_iso_iso639_1_description/up.sql
-- Add a column to the iso.iso639_1 table
ALTER TABLE iso.iso639_1 ADD COLUMN
description varchar(1024) NOT NULL DEFAULT '';

and

-- File 03032022_iso_iso639_1_description/down.sql
-- Remove a column from the iso.iso639_1 table
ALTER TABLE iso.iso639_1 DROP COLUMN description;

With this approach, it isn’t uncommon for the definition of a SQL resource, such as a table, to end up in multiple files: of which there are hundreds. There is no central location for a “source of truth” for SQL resources.

Slows Down Early Development

Cycling a relational database server is doable if an application has yet to be productionalized. Creating up and down SQL will slow development and isn’t required if we can quickly cycle our database server.

Further, it is difficult to make significant breaking changes to SQL DDL. We must resolve breaking changes through a SQL migration script which slows down early development. Why write a script to move from one set of tables to another when we could delete the tables we never used and start using the new tables immediately?

We could bypass the run-once migration process by editing previous migrations scripts, rolling back to the beginning, and starting over. But this misses the intent of the run-once migration script (see above definition). If you choose a framework, you need to live with the pains of that framework.

Slower Migration Time

Some migration tools, especially those that use an ORM to define the SQL schema file, will run slower than running raw SQL DDL against a server.

Some tools will have an ORM SQL migration script and a generated SQL migration script. The generated migration script can result in faster migration times. For example, we may need to set up our database before every test run.

The problem is that generated migration scripts lead to duplicate code in a repository: the ORM migration script and a generated migration script. These scripts could, and do, end up out of sync.

On top of that, there is no assurance that the ORM migration script will result in the same database schema as the generated SQL file. The difference has led to inconsistencies between developers’ local development environments. The difference has led to tests passing locally but not in CI/CD.

Unused SQL DDL Stays Around

A table was created early in the development process but is no longer needed. A migration script eventually dropped the table, but the table definition still lives in the active codebase.

NOTE: In the case of idempotent SQL DDL, you may still have unused code sitting around, but it is much less: a DROP TABLE, for example, instead of the whole original table definition potentially spread across multiple files.

Squash to The Rescue?

Some migration tools support squashing the migration scripts to a single up and down file. This process loses even more historical information, defeats the underlying intent of migration scripts (see definition above), and is a bandaid at best.

No Easy Historical Insight

Considering that a single SQL resource may be defined across multiple files, rebuilding a picture of how any given SQL resource came about is difficult. When did a table get a new column? When was a column dropped? Is this table still used? What table replaced it?

SQL DDL Is Not a First-Class Citizen

A first-class citizen (also type, object, entity, or value) in a given programming language is an entity that supports all the operations generally available to other entities. — https://en.wikipedia.org/wiki/First-class_citizen

An operation generally available to other types of code is historical-comparison available with a tool like Git of how a resource (function/object/method/etc.) changes throughout the life of the code. This comparison is not available with “run once” migration scripts as changes to the same resource reside across multiple files: not within the same file as is the case with most code.

Creates Pain Points Which Affects Design Decisions

Run-once migration scripts make it more challenging to change a database.

For example, updating a stored procedure requires creating two new files: an up file with the new stored procedure’s code and a down file that contains a copy of the original stored procedure. Imagine making a few more changes to that stored procedure: duplicating it twice each time. A global codebase search now shows multiple results of the same stored procedure.

Duplicating code is a huge pain point that I don’t want to endure, and I’m now more inclined to take a different design approach to solve a problem: even if a stored procedure would have been a better solution.

The Solution is Idempotent SQL DLL Scripts!!!

Idempotent SQL DDL resolves all the issues of the “run once” migration script.

Idempotent SQL DDL can run multiple times against a running database instance without issue. Making a “run-once” migration script idempotent is generally straightforward (differences highlighted in bold):

-- File 10_iso_schema.sql
-- Creating a schema
CREATE SCHEMA IF NOT EXISTS iso;
-- Creating a table
CREATE TABLE IF NOT EXISTS iso.iso639_1 (
alpha2_id iso.alpha2 NOT NULL PRIMARY KEY,
label shared.label NOT NULL
);

Only one file is required, there is no longer a need to create a down file, and the file is named something a bit easier to read: 10_iso_schema.sql instead of 01292022145902_iso_schema/up.sql.

Running the script the first time results in the following output:

CREATE SCHEMA
CREATE TABLE

And running it a second time does not result in an error:

NOTICE:  schema "iso" already exists, skipping
NOTICE: relation "iso639_1" already exists, skipping
CREATE SCHEMA
CREATE TABLE

Resources Now Defined in One File

The table iso.iso639_1 needs a description column added by updating the existing migration file:

-- File 10_iso_schema.sql
-- Creating a schema
CREATE SCHEMA IF NOT EXISTS iso;
-- Creating a table
CREATE TABLE IF NOT EXISTS iso.iso639_1 (
alpha2_id iso.alpha2 NOT NULL PRIMARY KEY,
label shared.label NOT NULL
);
ALTER TABLE iso2.iso639_1 ADD COLUMN IF NOT EXISTS
description varchar(1024) NOT NULL DEFAULT '';

The table definition, and any changes, are now reflected in the same file, making it easier to see the change history of a table in Git.

Faster Early Development

There is no longer a need to create down scripts, and significant changes are possible: sometimes even after the productionalization of a program.

Unused SQL DDL Does Not Stay Around

Remove unused resources, like a table and all associated SQL DDL (constraints, indexes, etc.) from the migrations script: most likely, we only need to keep around a DROP TABLE command.

Easy Historical Insight

We can now use Git to see historical changes in a resource because all changes for the same resource are in one location.

SQL DDL Becomes a First-Class Citizen

For me, this is probably one of the most significant advantages. SQL DDL now becomes a first-class citizen. Any operation available to other types of code is now available to SQL DDL: historical viewing of changes via Git being an example.

No More Pain Points

Updating a stored procedure only involves changing existing code. There is no longer a need to duplicate code: a pain point we no longer have to endure.

I’m now more likely to use stored procedures and functions in my engineering solutions when it makes sense.

No Need To Squash

There is no longer a need to “squash” migrations scripts because they are already “squashed.”

Doesn’t Necessarily Support Undo

Note that it is more difficult to “undo” something using Idempotent SQL DDL because there aren’t any undo files.

Conclusion

Idempotent SQL DDL reduces redundant code and enables faster development turnaround. It allows us to apply better software engineering practices by removing unnecessary pain points.

Finally, we can now treat SQL DDL as first-class code.

[1] Some schema migration tools don’t necessarily require SQL but use Object-relational mapping (ORM).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Eric Hosick

Eric Hosick

Creator, entrepreneur, software architect, software engineer, lecturer, and technologist.