Update your MySQL Schema & a Million Rows Headache-free: Part 2

Sylvain Fabre
AssoConnect
Published in
4 min readJan 26, 2020

Updating a million rows or your entire database schema doesn’t have to be a stressful all-nighter for you and your team! Lucky for you, it is easy to execute these changes during office hours with zero downtime.

Photo by Tobias Fischer on Unsplash

The first part deals with database schema changes, such as adding, renaming and dropping a column. In this part, I’ll focus on how you can update a million rows without any delay.

Why can Updating a Million Rows Create a Delay?

Running an UPDATE statement when MySQL uses an InnoDB engine will set an exclusive lock on all related rows. Depending on the conditions you set in the WHERE clause, the entire table may end up being locked.

This is never good news as all other connections will have to wait for this statement to be completed: this means waiting virtually forever if your table counts a million rows.

How to Prevent Locking?

Actually, you cannot prevent your rows from being locked as this is how MySQL works.

But what about making this lock last as little time as possible?

To do so, you need to enforce two rules:

  • Batch the rows: update only a few of them at the same time
  • Ensure your row filtering is efficient

Batch Updates Using a Procedure

Let’s say you have two entities, Product & Order, and you are migrating from using auto-incremented ids to time-ordered UUID as recommended in this Percona article.

Your tables’ schema looks like this with a Many-to-One relationship from Order to Product.

Product table’s schema
Order table’s schema

Now, you’ll want to fill the new order.product_uuid column.

The most basic way to do this is to run the following query:

UPDATE order
INNER JOIN product ON product.id = order.product_id
SET order.product_uuid = product.uuid

This works but it also creates a lock on your entire table.

The trick here is to update rows by small batches:

  • MySQL will still lock some rows but the lock won’t last as long
  • MySQL will be able to run queries requested by other connections between two of your UPDATE statements

We use MySQL procedures at AssoConnect: it is a kind of script you write using the MySQL language, which you run on your server.

Following our use-case, the procedure is defined and called like this:

-- New delimiter
DELIMITER $$
CREATE PROCEDURE LoopSetUuid()
BEGIN
SET @id_start := 0;
loop_label: LOOP-- Select the 1000th row
SELECT @id_end := MAX(order.id)
FROM (
SELECT order.id
FROM order
WHERE order.id > @id_start
ORDER BY order.id
LIMIT 1000
) AS tmp;
-- No more line found
IF @id_end IS NULL OR @id_end = @id_start THEN
LEAVE loop_label;
END IF;
-- Updating rows
UPDATE order
INNER JOIN product ON product.id = order.product_id
SET order.product_uuid = product.uuid
WHERE order.id > @id_start
AND order.id <= @id_end;
-- Incrementing the offset
SET @id_start := @id_end;
-- Monitoring the progress
SELECT @id_end;
-- Give some time to your server to process other queries
DO SLEEP(0.5);

END LOOP;
END$$-- Setting back the delimiter to its default value
DELIMITER ;
-- Executes the procedure
CALL LoopSetUuid();
-- Clean up
DROP PROCEDURE `LoopSetUuid`;

Let’s go through the different parts.

The first one is DELIMITER $$. By default, MySQL uses a semicolon as a delimiter between two queries. Here we change it as we need MySQL to process the whole procedure definition as a single statement. You can use whatever you want, I’ve picked a double dollar sign. We set this back to a semicolon once the procedure is defined.

Then, we need an upper-bound for a set of 1000 rows. We don’t use an offset with a LIMIT offset, row_count as it is quite inefficient when the offset gets big.

Lastly, we have:

  • a condition to exit the loop once all the data has been processed
  • the update statement
  • the offset incrementation.

What’s Next?

The procedure will run a lot of queries leading to some charge on your server. Try and change the timeout of the SLEEP statement to better suit your needs. For example, a lower value will complete the procedure faster but can cause higher CPU usage, slower queries for your clients and increase replication lag.

--

--

Sylvain Fabre
AssoConnect

Happy CTO at https://www.assoconnect.com a SaaS product to help non-profits focus on their missions. We’re hiring!