Backfilling data in Rails 5.0 & few thoughts on writing safe migrations.

A Weird Craft
Jul 27, 2017 · 4 min read

Migrations in Ruby on Rails are a blessing and a curse (at times). While Rails makes it extremely easy and convenient for you to interact with your data and alter your database in a structured manner it can quickly lead to misery if you don’t know some of the nitty gritty parts of it.

One of the frustrating things can be backfilling of data in your production server. To illustrate this with a very poor example, imagine you have have a table let’s call it cars. When you first started working on cars perhaps due to poor oversight you decided to put in a column which gives information on the make of the tires say tire_make. People have been filling in the information as you wanted and you have say a 1000 records of cars along with the make of the tires.

Soon a day comes where you decide that you want to also include several other details of the tire and what you really want in your code base is a relationship in your models which says that a car has_many :tires. This brings up the need to have a new tiretable.

You do your migration to create the new table and column and another migration to delete the old tire column, until you quickly realize that you will lose out on all your earlier data if you did that.

Hopefully you caught that, what you really want is to use the older data to fill in the new table and then get rid of it.

Few things to watch out here:
1) Make sure your back-filling migration comes before removing the column (obv).
2) Make sure they are both in the same PR! Why?
Imagine this: you merge your backfilling PR and then, all of a sudden within the few minutes you are about to merge your other PR, someone else has made a new record and if your second PR didn’t do a backfill migration, you would end up losing on that single row of data. Which in this day and age is a nightmare.

Now what does backfilling of a codebase actually look like?

Try something like this:

# frozen_string_literal: trueclass MyBackFillTask < ActiveRecord::Migration[5.1]
def change
Cars.find_each do |car|
Vehicle.create!(type: "car", make: car.make)
end
end
end

Cool, that was pretty easy.

There are some stuff to watch out for however when your migrations are interacting with models. From the exterior the above code looks fine, however the huge no no is the use of the Cars model name to interact with the cars table. This creates a dependency that you don’t want. Say in the latest master you changed the model name from *Cars* to *Automobile*, your migrations would have failed. But for instance if your dependency was on the actual table name, you would have been fine, because to change the table name you probably would have to make a new migration, and since migrations are run sequentially, we are good!

There are a couple of solutions to this to avoid such dependency problems:

  1. Migration models!
    Considering writing the above code like this:
# frozen_string_literal: trueclass MyBeautifulBackFillTask < ActiveRecord::Migration[5.1]
class MigrationCar < ActiveRecord::Base
self.table_name = :cars
end

class MigrationVehicle < ActiveRecord::Base
self.table_name = :vehicles
end

def change
MigrationCars.find_each do |car|
MigrationVehicle.create!(type: "car", make: MigrationCar.make)
end
end
end

This now gets rid of all dependencies on the model name and is now dependent on the table names instead. This is a win because, if you were to actually change the table names you would have to run a migration as well. And since every migration is run sequentially, your migration task may fail if the table name changed before you ran yours (which is ok), you will know the table name changed and make the changes to your migration job accordingly. If the table name was changed in a later migration, there are no errors as well because your task has already been run.

Compare this with being dependent on a model name, changing class names in rails don’t require any migration-y things (thankfully) and if someone on your team has merged in a pull request that makes that change to master, your migration job will fail in master.

A few gotchas however: If your migration involves changing up some association logics, you can include them in the MigrationCar class as well such as putting in a `has_many` relationships and so forth. I will cover that in another article.

2. Executing SQL statements

The second option you have is execute manual SQL statements which interact directly with your database tables rather than the ORM (object Relational Mapping) layer.

I am not a big fan of this way, mostly because you loose out on a lot of ActiveRecord functionalities and your code doesn’t look as elegant as it could be (and we like elegance). You also have to be extra careful when doing this because you have to make sure you are writing the appropriate def down SQL statements to rollback your migration when the need arises. However, that being said this is the fastest way to go about executing such jobs in terms of performance. Read this article to see how to go about doing this: http://jacopretorius.net/2014/07/patterns-for-data-migrations-in-rails.html

Few other migration tips:

  1. Never ever modify an old migration. If you are doing this, you are looking forward to a very difficult time ahead. Unless you have the ability to clean the db and reset it, you should never do this.
  2. Always ensure that your migrations are rollback-able. Test your rollback before merging the PR wherever possible.
  3. Incremental migrations: try to ensure that each of your migration jobs have a singular purpose (similar to classes). Try to ensure that you can describe in one sentence what that migration job is supposed to do. If you can’t, break it up into multiple rollback-able migrations.

A Weird Craft

Written by

Software is strange. Random thoughts on product management, business, Ruby, Rails and whatever I feel like writing about. I use this as a personal journal :)

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade