Making a change to a large database table

Published in

FutureLearn

6 min readApr 20, 2021

In this blog post, Jamie Cobbett, one of our Technical Leads, talks about a technique to make changes to very large database tables whilst minimising downtime.

Making changes to very large database tables comes with a few risks, and I want to share some things we’ve learned including a technique we’ve developed for doing this in an easy way in Ruby on Rails — though the patterns should be applicable elsewhere. Along the way, I’ll also talk a bit about how we work on features at FutureLearn.

The feature

FutureLearn has a big focus as a social learning platform — meaning you learn alongside and with other people. Our users can interact with their peers in different ways, including assessing each other’s work and exchanging ideas via comments on the course content.

Our research revealed that learners want better ways of sharing code in these comments — currently they use external sites like GitHub or Pastebin, or they post the code into their comment unformatted. Until this point, we’ve only supported plaintext (with a few extra features) and any leading whitespace would be removed — particularly problematic for code! It felt like a no-brainer to improve our commenting feature to allow code formatting, and make our users happier. We decided that Markdown would be the best way of doing this, and it would come with a huge bonus of giving users the ability to express themselves better by being able to format their comments.

We decided a couple of things up front:

We didn’t want to break existing comments by suddenly treating them as Markdown — if a learner happened to use some Markdown syntax, or something that looks like disallowed HTML, we didn’t want to change what they’d written.
If a learner edited a comment created when Markdown wasn’t available, we didn’t want it to suddenly be treated as Markdown when they saved it — and potentially mangle their comment or force them to rewrite it

Therefore, we knew we’d need to track against each comment in our database whether Markdown was enabled when it was written, so that we’d know whether to treat it as Markdown, or plaintext when rendering it.

Great, we thought, we can “just” add a column to our comments table, and set it to true when the feature is enabled, and use that when we come to displaying the comment.

A screenshot of a comment on the FutureLearn website, using Markdown. — Displaying Markdown in comments on FutureLearn courses

Getting started

We had a fair chunk of work to do to get this feature live, so we started by breaking the work down into useful, releasable chunks as separate stories that engineers could deliver. This helps with momentum, morale, speed of delivery and keeping the team resilient and flexible. It also helps when things that seem small balloon out — if the work was part of a bigger chunk, the rest would be blocked in progress by the problematic aspect. We used a feature flag so that we could merge to master, ship incrementally and test in staging and development.

One of the first stories was to track against each comment record whether Markdown was enabled for that particular comment. This story felt a bit on the small side to us, but reasonable. An engineer on the team picked it up and started work.

Problems

We quickly discovered that our users are so sociable that they have posted over 56 million comments (2 million more have been written in the few weeks since). Our deployment process is fully automated, including running database migrations. We knew that making a change to such a large table might cause problems when it came to applying the change. Specifically, we were worried about:

Performance degradation whilst the migration is running — the load on the database might slow our site down, perhaps even making it inaccessible.
Blocking other deploys — the production deploy process for this particular change might easily take tens of minutes, or hours — especially when you factor in the large numbers of writes and reads to the table that are going on all the time. It might leave us unable to ship changes and revert the change if it failed.

For both of these worries, we came up with a few approaches to mitigate the risks.

Avoiding performance degradation whilst the migration is running

There are a few things you should consider:

If you are making multiple changes, you probably want to make the changes in a single SQL statement so that the database server only has to do one pass through the table. Rails’ ActiveRecord has a change_table :bulk option to make that syntactically easier.
Review your traffic patterns so that you can make the change at a quieter time — the migration might well run faster due to less contention, issues are less likely and would affect fewer users. From a commercial point of view you probably want to avoid breaking the site when your users are more likely to purchase.

For MySQL, adding a column is thankfully an online operation, which means that:

In-place table alterations are possible and are the default from MySQL 5.6. This means less disk space usage and I/O overhead than the table-copy method.
Changes to data in the table are possible whilst the migration is running (as long as the right LOCK=none setting is used — MySQL 5.7 and above has this by default.

Avoiding blocking other deploys

In our case, we don’t want the migration to run automatically in production and staging. The migration running in development isn’t an issue, because in development we only have an anonymised subset of the comment records, and is useful because it means other engineers can build upon it. It also means that we avoid long-running branches, and all the headaches that come with maintaining them.

So, what did we do?

In order to manage these risks, we wanted to control when this database change was made so that we could release the code and then pick our time to manually run the migration — with the ability to cancel/retry/amend it as necessary.

The first thing we did was to break up the changes into two branches: one with just the schema change, and a second one that would make use of the new column. This would let us merge the schema change, let it deploy and then safely apply it when convenient. Once that was done we could safely merge the second branch.

For the migration itself, we wrote it this way:

class AddMarkdownEnabledToComments < ActiveRecord::Migration[5.1]
  def up
    return if (Rails.env.production? || Rails.env.staging?) && ENV[‘RUN_OUT_OF_BAND_MIGRATION’].blank?    add_column :comments, :markdown_enabled, :boolean, null: false, default: false  end  def down
  end
end

You can ship this safely, since Rails will run it without changing the database and record it as having run. Then you have a couple of options to actually make the schema changes manually:

Work out the SQL it would generate, deploy the change and then, ssh onto a production database console and manually run the SQL:

ALTER TABLE comments ADD markdown_enabled boolean NOT NULL default false.

This isn’t ideal as typos or omissions could creep in.

2. As long as the migration has an empty down method, you can trick Rails into thinking the migration hasn’t run and rerun it. Rails tracks which migrations have run by adding the timestamp ID to a schema_changes database table. So, if you manually delete the relevant row and then rerun the migration like this:

‘RUN_OUT_OF_BAND_MIGRATION’=true bundle exec rails db:migrate:redo VERSION=xxxxxx.

This is fairly manual and comes with a risk that if a deploy happens whilst the migration is running, the deploy could start running the same migration.

Both these options allow us to make the schema changes, but given how manual they both are, they come with some risks. However, we came up with a better technique that avoids all of these issues: a reusable script that directly runs the migration code and so dodges the Rails mechanism that would otherwise skip the migration as it considers it to have run already. Here’s the content of our script:

ENV[‘RUN_OUT_OF_BAND_MIGRATION’] = “1”timestamp = ARGV[0]fail ‘timestamp must be a number’ unless timestamp && timestamp.match?(/^\d{14}$/)file_names = Dir.foreach(“db/migrate/”).select { |f| f.starts_with?(timestamp) }fail “Expected 1 file, got #{file_names.inspect}” unless file_names.size == 1file_name = File.basename(file_names.first, ‘.rb’)
klass_name = file_name.sub(/\d+_/, ‘’).camelizerequire Rails.root + “db/migrate/#{file_name}”puts “Loading migration #{klass_name.inspect}”migration = klass_name.constantizemigration.new.up

Put this in your script folder — eg (script/out_of_band_migration.rb) and run it like this:

bundle exec rails runner script/out_of_band_migration.rb xxxxxx

We now have a clean, reusable technique that involves the minimal amount of manual work and allows us to minimise the other risks.

Conclusion

We’ve really enjoyed working on this feature, and we’re excited to release it to our learners. Support for Markdown in our comments will only be available on a limited number of courses to begin with, but we hope to roll it out to all users at a later date. You can read more about it here.

If you’re interested in working at FutureLearn, we’re hiring! Please take a look at our available roles.