Rails Database Migrations with No Down Time using Gh-ost

Lucas Moore
Building Ibotta
Published in
7 min readJul 10, 2018

At Ibotta, we have a big Ruby on Rails application that we lovingly call “The Monorail”, since it’s a large monolithic app and we also have other, smaller microservices. It’s been around since the beginning of Ibotta in 2012 and powers a large portion of our app experience. At this moment, we have 253 database tables in our Rails app, some with more than 200 million rows.

The schema of this database is always changing. We have migrations that modify the Monorail database about once a week. Sometimes that change is a new table to go along with a new feature, sometimes it adds or removes a column of an existing table.

But what happens when we need to change one of our biggest database tables? The normal “Railsy” way of handling migrations starts to break down.

The Problem

If you’re familiar with Ruby on Rails applications, you know to run database migrations with the rails db:migrate command. This will allow Rails to connect to the your database and execute the SQL statement that is part of the migration.

What can we do with a 9 GiB table?

To simplify things slightly, we can think of adding a column to a table like iterating through the rows of a table, adding in the new column one by one to each row. When all the rows have the changes, the whole table now has that column. For more details, you can read more about the types of locks here.

What a table might look like in the middle of a migration

While the migration is running, the database table is locked. This prevents data from being written over the changes that are in progress. If we tried to write to a row that didn’t have the changes yet, we would lose that data into the abyss, like row 3 above.

This means we need to take the table offline until the migration is done going through every single row. For a database table that has tens of millions of rows, this might take it offline for hours.

We can’t run our app without a database, and this big migration would then take our app offline for hours! Obviously this isn’t acceptable. This is where a handy tool called gh-ost can help!

Gh-ost to The Rescue!

Gh-ost is a tool developed by Github to change the schema of large MySQL tables without needing to take the table offline at all. That’s why it’s called the GitHub Online Schema Transmogrifier. What makes gh-ost amazing, above and beyond solving the problem mentioned above, is that it’s testable and pauseable.

Please note that the following slides are taken from my talk at Denver.rb, hosted at Ibotta, which covered this very subject. The gh-ost documentation is also solid and this information is largely from there.

A High Level Look at Using Gh-ost

This chart is directly from their docs

So how does this magical tool work? After downloading and installing it, you get the ./ghost executable command which we can run with many configurable - -flags to achieve the desired changes.

If we are running a gh-ost migration to add a column to our customers table, gh-ost makes a new “ghost table” called customer_gho. It slowly copies all the data over to the new table, record by record. As it does this, it also adds the new columns to each row.

What’s super cool is that gh-ost also captures the real time changes to the live customers table with a streaming log and applies those to the customers_gho table too. It’s all done asynchronously, which allows us to pause the migration if it’s causing any problems. We can then continue later effortlessly.

When gh-ost reaches the point of copying all the records from customers to customers_gho, it holds both tables in sync until we flag to cut over to the new table. To cut over, it swaps the names of both tables. When that’s done, we now have a customers table with our new column in production, all without taking it offline. Neat!

Nitty Gritty Details

Let’s dig a bit more into the details of how gh-ost truly works. As we said, it is testable and pauseable.

Gh-ost is testable

There are more configurable flags for gh-ost than I can cover in this article, but the testability of gh-ost is a pretty easy concept to grasp. As you’re configuring all the command flags you want to use, you can leave off the
- -execute flag and the migration will be a dry run.

When you’ve got the table connected, the columns vetted and everything seems to be ready, just add in one more flag to your migration: - -execute. That makes testing super simple.

Gh-ost is pausable

There are a few ways to pause a gh-ost migration. There is a panic flag file and two throttle flags. We can configure these flags like is shown in this slide above. While gh-ost is running, it’s listening on an open socket for these flag files to be created.

In the event something about the migration goes awry, simply create the file and the active migration will pause. When you’re ready to resume, remove the file and gh-ost picks up where it left off.

There are two throttle flags so that we could run two migrations at once, stopping either one or both if they shared an additional throttle file. Cool, huh? This level of detail is amazing. Great work, gh-ost folks!

Caveats About Gh-ost

It only works for MySQL tables and isn’t a panacea.

Gh-ost is specifically designed for MySQL databases with row based replication. If you have another database type, you will need to search for other solutions. Additionally, it’s worth noting that gh-ost might never finish on comically large tables or with certain flag configurations.

If you’re running it with gentle enough configurations, your migration might eventually slow to a crawl. Gh-ost isn’t a panacea for multi billion row table migrations or tables that are hundreds of gigs in size. It might suit you to create a new table in this case.

It does not work on tables with foreign keys.

This is something we’ve run into and for which there is currently no work-around. There is an interesting pull request that discusses this in depth, but it’s easy to understand why foreign keys would be an issue for gh-ost.

In an auto-incrementing table, as gh-ost is writing records asynchronously from the existing table to the ghost table, there is no guarantee of maintaining the order. We could end up with mismatched records between tables, which busts the whole purpose of foreign keys.

Avoid Plain Text Credentials in Shell History!

One of the default ways you can connect to your database is with the
- -username and- -password flags, which are in plain text. If someone got unauthorized access to your machine, a simple reverse look up on your shell history would cause a major headache for your security team.

A better solution is to securely create and destroy a cfn file with the credentials inside and use the conf flag to pass those credentials into gh-ost.

Keeping Rails Schemas in Sync

When running a gh-ost migration, we’ve started to manage the database directly instead of through Rails. This means that our MySQL schema in production and our Rails schema have started to drift apart.

Our solution at Ibotta is to run no-op production deploys that keep the database and our schema.rb file in sync in each environment.

A Full Gh-ost Migration Example

So that’s a simplified explainer of this lovely tool called gh-ost. There is much more detail we could explore, but I’ll leave that to you to do, as their documentation is quite good. To end, I’d like to show what a complete gh-ost migration with all the flags looks like.

More Information:

Questions? Thoughts?

Some further resources to check out to troubleshoot and learn about gh-ost:

If you’re trying to figure out gh-ost, feel free to reach out to me personally. My email is lucas.moore@ibotta.com.

Oh, We’re Hiring!

If you’ve read this far, you clearly are interested in working on interesting database problems at scale! We are hiring Platform and Full Stack Engineers at Ibotta. Check out our open positions here.

--

--