Short tips on long migrations

Estimate migration time

Before everything, check how many records will be affected and estimate the total migration time. Narrow down which records should be processed and process only the essential, for example only records of active accounts.

Reduce memory consumption

Avoid loading all records into memory by using ActiveRecord::Batches methods such as find_each and find_in_batches.

class Migrator 
BATCH_SIZE = 100
  def process
Record.find_each(batch_size: BATCH_SIZE){...}
end
end

Encapsulate updates in transaction

Instead of executing hundreds of queries, execute them in just one transaction. Not only helps with the data integrity, it is also quicker. You can also combine the transaction block with the find_in_batches method.

class Migrator
BATCH_SIZE = 100
  def process
Record.find_in_batches(batch_size: BATCH_SIZE) do |group|
Record.transaction do
group.each{|record| record.update}
end
end
end
end

Supply a resume mechanism

If a migration lasts for several days, the process may probably hang or stop so it’s important to have a way to re-start from the point it stopped.

Migrator.process(start: <start_id>)

Run in parallel processes

You can split the records into groups and run many processes in parallel to reduce the time.

class Migrator 
BATCH_SIZE = 100
  def process
Record.where('id BETWEEN :s AND :f', s: start, f: finish).find_each(batch_size: BATCH_SIZE){...}
end
end

Note: The ActiveRecord::Batches.find_each method supports both :start and :finish parameters in Rails 5. So, you can write directly:

class Migrator 
BATCH_SIZE = 100
def process
Record.find_each(start: <start_id>, finish: <finish_id>, batch_size: BATCH_SIZE){...}
end
end

Have a progress indication

Always good to know if a script is running or stuck.

# occupies just one line
print "\r #{record.id}"

Or have a Minitest::DefaultReporter-ish indication.

# for successful transactions
print '.'
#for failed transactions
print 'X'

Keep failed transactions

Have a database table to keep the failed records with their error message.

ActiveRecord::Base.connection.execute("INSERT INTO failed_records (record_id, error_message) VALUES ('#{record_id}', '#{error_message}')")

Run only for specific records

Provide an array of identifiers as parameter and process the corresponding records; this is basically to re-process previously failed transactions.

Migrator.process(ids: [...])