Unlocking Technical Debt, One Missing Foreign Key Constraint at a Time

or how we made our application more correct

Introduction

We all know what software development tends to look like. As an application organically grows, new tables are created, new relationships are added, new libraries are used while others are removed, etc… While there are many blog posts that will explain why having referential integrity is a good idea and others that will explain why DHH chose not to use database enforced referential integrity in rails, I want to focus on how we approached adding lots of missing foreign key constraints to our very large database.

The Problem

We’ve been building a great application for over 6 years, and since there is no enforcement for non-existent foreign key constraints, we ended up with about 100 or so foreign key constraints missing of the nearly 1200 foreign keys we have in this application’s database.

What could we do?

Since our application is so large (our database was around 1.4 TB at the time), doing any significant work touching schemas is painful. Over the years, we’ve taken steps to ensure that our lives are a bit easier with respect to safety and peace of mind around such migrations. For example, we don’t add indices in schema migrations, but instead in data migrations (using data_migrations). Doing this allows us to work on the database without having to wait for changes to be ready before booting up the application. We did all the changes we needed in both data and schema migrations.

In all, here are the steps we took:

1. Add missing indexes for constraints we’re adding
2. Make ruby changes to make adding foreign key constraints easier
3. Add missing foreign key constraints
4. Fix invalid data due to missing foreign key constraints
5. Validate new foreign key constraints

What was missing?

We knew there were constraints missing. However, we didn’t know exactly which constraints, and our system is big enough that doing it by hand seemed like a Herculean task. We used immigrant’s check_keys helper task ( rake immigrant:check_keys) to identify the missing constraints.

Once the missing constraints were identified, we added any indices that were missing to our database. We took advantage of data migrations, and added them concurrently:

After the indices were deployed and in use, we went through the classes whose relationships we’d be adding FK constraints for, and ensured that they were correctly handled in code. For example, for null: false columns that we’d be adding, we corroborated that the code had dependent: destroy decorations for the relationship macros. For columns that were nullable, whose FK constraints would be created with on_update: nullify, on_delete: nullify, we made sure that the relationships were declared using a dependent: nullify.

Once those changes were finished and we were confident things were going well, we were ready to actually add the missing foreign key constraints. We used the immigrant gem again to generate a migration for us ( rails generate immigration AddMissingFkConstraints). The resulting output consisted of many lines that read:

In previous work, we patched our project’s add_foreign_key to make the constraints deferrable. This forced us to always make these kinds of changes in two steps: first add the constraints in deferrable mode — which makes it so that only new records are checked, and then validate the constraints — which validates all the existing records.

At this point, we added a test that ensured our build would break if anyone added a new foreign key without its corresponding constraint — we leveraged immigrant once again to assert there weren’t any missing constraints for any newly added foreign keys. The test looked something like this:

(A closer representation of the test we are running now can be found here).

When the deferred constraints were out and we had test coverage that ensured no new foreign keys without corresponding constraints were added, we had to fix whatever records we found that violated the newly added constraints (that were still deferred!). Depending on the type of relationship (nullable or not), we either nullified the foreign key in the records or deleted them — we already knew that the records were invalid… we said so in our relationships. In this step, we also cleaned up a lot of tests that had not evolved as closely to reality as we would have liked.

Since we have certain logic that is triggered when there are changes to record in application code using rails after_* hooks, we could not write raw SQL queries to update the records we needed, but instead, needed to use ActiveRecord.

A down side of using ActiveRecord to generate the queries we wanted to write, is that default scopes are always applied. To get around default scopes, we ended up crafting something like this:

This class is used from a data migration that spawns a sidekiq job per type of record to be cleaned like so:

We could definitely see the light at the end of the tunnel. The only thing left to do was to validate the newly added foreign key constraints. This operation was performed using data migrations with Data Definition Language (DDL) transactions disabled:

Aftermath

We have seen some improvements around error handling (and error bubbling!). While operations that invalidated our foreign key constraints were not too common, they are handled much better now.

Resources