Update your MySQL Schema & a Million Rows Headache-free: Part 2

Published in

AssoConnect

4 min readJan 26, 2020

Updating a million rows or your entire database schema doesn’t have to be a stressful all-nighter for you and your team! Lucky for you, it is easy to execute these changes during office hours with zero downtime.

The first part deals with database schema changes, such as adding, renaming and dropping a column. In this part, I’ll focus on how you can update a million rows without any delay.

Why can Updating a Million Rows Create a Delay?

Running an UPDATE statement when MySQL uses an InnoDB engine will set an exclusive lock on all related rows. Depending on the conditions you set in the WHERE clause, the entire table may end up being locked.

This is never good news as all other connections will have to wait for this statement to be completed: this means waiting virtually forever if your table counts a million rows.

How to Prevent Locking?

Actually, you cannot prevent your rows from being locked as this is how MySQL works.

But what about making this lock last as little time as possible?

To do so, you need to enforce two rules:

Batch the rows: update only a few of them at the same time
Ensure your row filtering is efficient

Batch Updates Using a Procedure

Let’s say you have two entities, Product & Order, and you are migrating from using auto-incremented ids to time-ordered UUID as recommended in this Percona article.

Your tables’ schema looks like this with a Many-to-One relationship from Order to Product.

Now, you’ll want to fill the new order.product_uuid column.

The most basic way to do this is to run the following query:

UPDATE order
INNER JOIN product ON product.id = order.product_id
SET order.product_uuid = product.uuid

This works but it also creates a lock on your entire table.

The trick here is to update rows by small batches:

MySQL will still lock some rows but the lock won’t last as long
MySQL will be able to run queries requested by other connections between two of your UPDATE statements

We use MySQL procedures at AssoConnect: it is a kind of script you write using the MySQL language, which you run on your server.

Following our use-case, the procedure is defined and called like this:

-- New delimiter
DELIMITER $$CREATE PROCEDURE LoopSetUuid()
BEGIN
  SET @id_start := 0;loop_label: LOOP-- Select the 1000th row
    SELECT @id_end := MAX(order.id)
    FROM (
      SELECT order.id
      FROM order
      WHERE order.id > @id_start
      ORDER BY order.id
      LIMIT 1000
    ) AS tmp;-- No more line found
    IF @id_end IS NULL OR @id_end = @id_start THEN
      LEAVE loop_label;
    END IF;-- Updating rows
    UPDATE order
    INNER JOIN product ON product.id = order.product_id
    SET order.product_uuid = product.uuid
    WHERE order.id > @id_start
    AND order.id <= @id_end;-- Incrementing the offset
    SET @id_start := @id_end;-- Monitoring the progress
    SELECT @id_end;-- Give some time to your server to process other queries
    DO SLEEP(0.5);
 
  END LOOP;END$$-- Setting back the delimiter to its default value
DELIMITER ;-- Executes the procedure
CALL LoopSetUuid();-- Clean up
DROP PROCEDURE `LoopSetUuid`;

Let’s go through the different parts.

The first one is DELIMITER $$. By default, MySQL uses a semicolon as a delimiter between two queries. Here we change it as we need MySQL to process the whole procedure definition as a single statement. You can use whatever you want, I’ve picked a double dollar sign. We set this back to a semicolon once the procedure is defined.

Then, we need an upper-bound for a set of 1000 rows. We don’t use an offset with a LIMIT offset, row_count as it is quite inefficient when the offset gets big.

Lastly, we have:

a condition to exit the loop once all the data has been processed
the update statement
the offset incrementation.

What’s Next?

The procedure will run a lot of queries leading to some charge on your server. Try and change the timeout of the SLEEP statement to better suit your needs. For example, a lower value will complete the procedure faster but can cause higher CPU usage, slower queries for your clients and increase replication lag.

Update your MySQL Schema & a Million Rows Headache-free: Part 2

Why can Updating a Million Rows Create a Delay?

How to Prevent Locking?

Batch Updates Using a Procedure

What’s Next?

Written by Sylvain Fabre