Why Does Your MySQL Database Lock Up? Uncovering the Truth About Online DDL

Hamidreza Niazi
7 min readMay 30, 2024

--

Online DDL in MySQL offers concurrent DML operations, minimizing downtime. However, despite these advancements, running migrations in production can still lead to database locks and potential incidents. This article simulates one of the potential issues in high-traffic production environments and delves into understanding the underlying reasons.

Behind the Scenes of MySQL Online DDL Locks

Abbreviations

This article includes several abbreviations commonly used in database management and SQL. For clarity, these abbreviations are outlined below:

  • DDL — Data Definition Language
  • DQL — Data Query Language
  • DML — Data Manipulation Language
  • DCL — Data Control Language
  • TCL — Transaction Control Language

These commands are categorized as follows:

╔══════════════════════════════════════════════════════════════════════════╗
║ SQL Commands ║
╠══════════════╦══════════════════╦═══════════════════╦═════════════╦══════╣
║ DDL ║ DML ║ TCL ║ DQL ║ DCL ║
╠══════════════╬══════════════════╬═══════════════════╬═════════════╬══════╣
║ CREATE ║ INSERT ║ COMMIT ║ SELECT ║GRANT ║
║ DROP ║ UPDATE ║ SAVEPOINT ║ ║REVOKE║
║ ALTER ║ DELETE ║ ROLLBACK ║ ║ ║
║ TRUNCATE ║ CALL ║ SET TRANSACTION ║ ║ ║
║ ║ EXPLAIN CALL ║ SET CONSTRAINT ║ ║ ║
║ ║ LOCK ║ ║ ║ ║
╚══════════════╩══════════════════╩═══════════════════╩═════════════╩══════╝

Problem

MySQL can perform DDL operations while allowing concurrent DML, which is called online DDL. However, running migrations in production can still cause incidents by locking the database at the table level. To respect the burden of proof, I will simulate the issue first before describing why it occurs.

You can set up the simulation yourself by following the steps on your existing database or using this dummy Docker project I provided here.

Step One

Let’s create a simple table called foo with two columns, id and name:

CREATE TABLE foo (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);

Step Two

Then, while a long-running DQL is in progress:

SELECT * FROM bar WHERE SLEEP(60);

or a short-running DQL inside an open transaction:

START TRANSACTION;
SELECT * FROM foo WHERE id = 1;

or both!

Step Three

Try to add a new column via a DDL statement using the INSTANT algorithm, also known as the most performant one, concurrently in another session:

ALTER TABLE foo ADD info VARCHAR(255) NULL, ALGORITHM=INSTANT;

You may notice that the DDL operation cannot be performed and is waiting, which might not seem like a major problem.

Step Four

Now open a third session and run a simple DQL statement:

SELECT * FROM foo;

or a simple DML statement:

UPDATE foo SET name = 'test' WHERE id = 1;
Three concurrent sessions to a single MySQL database
Three concurrent sessions to a single MySQL database

Despite the INSTANT algorithm, the table is locked, and you cannot perform any operations. It can be even worse if other related tables with foreign key constraints exist, as they will experience the same lock. The DDL operation will start as soon as all previous operations are handled. Once the DDL begins, you can observe the expected behavior, allowing concurrent DQL and DML operations. This cannot be considered a rare case scenario, as Step Two is likely to happen frequently on production servers, particularly with a sequence of running operations.

Now that we’ve identified and understood the issue, let’s build up some knowledge to see why this happens. We’ll delve into the underlying mechanisms of MySQL locks and explore the reasons behind this behavior.

InnoDB Locks

InnoDB uses different lock types, but here we only need to get familiar with two of them: row-level and table-level locks.

Row-Level

  • Shared (S) Lock: Allows multiple transactions to read a row simultaneously.
  • Exclusive (X) Lock: Allows only one transaction to update or delete a row.

Table-Level

  • Intention Shared (IS) Lock: Indicates a transaction plans to set a shared lock on some rows within the table.
  • Intention Exclusive (IX) Lock: Indicates a transaction plans to set an exclusive lock on some rows within the table.

Online DDL Locks

Online DDL operations consist of three phases, each acquiring a different type of lock.

╔═══════╦════════════════╦═══════════════════╗
║ Phase ║ Title ║ Lock ║
╠═══════╬════════════════╬═══════════════════╣
║ One ║ Initialization ║ SHARED-UPGRADABLE ║
║ ║ ║ ║
║ ║ ║ ║
║ ║ ║ ║
╠═══════╬════════════════╬═══════════════════╣
║ Two ║ Execution ║ SHARED/EXCLUSIVE ║
║ ║ ║ ║
║ ║ ║ ║
╠═══════╬════════════════╬═══════════════════╣
║ Three ║ Commit ║ EXCLUSIVE ║
║ ║ ║ ║
╚═══════╩════════════════╩═══════════════════╝

Metadata Locks

MySQL uses metadata locking to manage concurrent access to database objects and ensure data consistency. Metadata locking applies to tables, schemas, stored programs (such as procedures, functions, triggers, scheduled events), tablespaces, user locks from the GET_LOCK() function, and locks from the locking service. The metadata_locks table in the Performance Schema provides information about metadata locks, showing which sessions hold locks and which are waiting for locks. We will discuss this table more in the next sections!

Diagnosing the Problem

With our prior knowledge, we can better understand the underlying reasons for the simulated issue. To gain more insights, let’s enable the option that stores metadata locks in the performance schema. This can be done by updating the setup_instruments table:

UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME = 'wait/lock/metadata/sql/mdl';

Now, let’s follow the simulation steps again. This time, at each step, check the metadata locks to understand the locks acquired by MySQL:

SELECT * FROM performance_schema.metadata_locks;

I’m running the simulation again against the schema below:

Relationship between foo and bar tables
Simple relationship between foo and bar tables

Now, let’s see what kind of locks are proposed for each step:

Metadata locks on performance schema
Metadata locks on performance schema

As you can see, a variety of lock types are granted or pending for the DDL step. This operation is performed in three phases, and each phase obtains specific locks (as mentioned in the table-level locks section). Now, let’s answer the question: why is the DDL waiting for the ongoing operations, and why can’t any DQL operations against the footable or any DML operations against the fooor bartables (related via the foreign key constraint) be performed until the DDL starts its operation?

Due to the exclusive metadata lock requirements outlined in the DDL phases, an online DDL operation may have to wait for concurrent transactions or DMLs that hold metadata locks on the table to commit or rollback. While the DDL is waiting, based on metadata lock acquisition, if there are multiple waiters for a given lock, the highest-priority lock request is satisfied first (EXCLUSIVE in this case). This means all DQL/DML operations will pause and wait for the DDL process to start and perform its locking mechanism. However, this doesn’t necessarily mean that new processes will be unblocked immediately, as the DDL might still perform an EXCLUSIVE lock.

Based on the compatibility matrix for table-level locks, the SHARED lock is not compatible with any kind of EXCLUSIVE lock. In our simulation, the last step depends on a SHARED_READ lock, while EXCLUSIVE locks are already granted due to the DDL operation. A lock is granted to a requesting operation if it is compatible with existing locks. If it conflicts with existing locks, it will not be granted because it would cause a deadlock.

╔════╦══════════╦════════════╦════════════╦════════════╗
║ ║ X ║ IX ║ S ║ IS ║
╠════╬══════════╬════════════╬════════════╬════════════╣
║ X ║ Conflict ║ Conflict ║ Conflict ║ Conflict ║
║ IX ║ Conflict ║ Compatible ║ Conflict ║ Compatible ║
║ S ║ Conflict ║ Conflict ║ Compatible ║ Compatible ║
║ IS ║ Conflict ║ Compatible ║ Compatible ║ Compatible ║
╚════╩══════════╩════════════╩════════════╩════════════╝

Online DDL Concurrency

Now let’s discuss what occurs right after the DDL operation starts. What will happen to those waiting operations?

The MySQL server determines how much concurrency is permitted during the Initialization phase, taking into account storage engine capabilities, operations specified in the statement, and user-specified ALGORITHM and LOCK options. The aspects we can control while writing the command are:

  • Algorithm (INSTANT, INPLACE, and COPY)
  • Lock Type (for INPLACE)

Since the COPY algorithm does not support online DDL, let's focus only on the other algorithms. The table below shows different concurrency levels based on your explicit options in the DDL command:

╔═══════════╦══════════╦═════════════════════════╗
║ Algorithm ║ LOCK ║ Concurrency Permission ║
╠═══════════╬══════════╬═════════════════════════╣
║ INSTANT ║ DEFAULT ║ DQL, DML ║
╠═══════════╬══════════╬═════════════════════════╣
║ INPLACE ║ NONE ║ DQL, DML ║
║ ║ SHARED ║ DQL ║
║ ║ DEFAULT ║ ~ ║
║ ║ EXCLUSIVE║ NONE ║
╚═══════════╩══════════╩═════════════════════════╝

Note that:

  • with the INPLACE algorithm, it is not always permitted to use LOCK=NONE.
  • When using LOCK=DEFAULT with the INPLACE algorithm, it permits as much concurrency as possible (omitting the LOCK clause is the same as specifying LOCK=DEFAULT).
  • With the INSTANT algorithm, there is only the default option, and you cannot specify LOCK=DEFAULT.
  • The algorithm you choose must be compatible with the specific command. In some cases, online DDL is not supported at all.

Conclusion

Understanding the details of MySQL’s online DDL and its locking mechanisms is crucial for maintaining database stability in production environments. To mitigate the challenges associated with online DDL, consider the following best practices:

  • Avoid running long-running and statistical operations against your production database.
  • Ensure transactions are not left open in your application flows for a long time.
  • Utilize online schema transformation tools such as Percona Toolkit or GH-OST.
  • Leverage blue/green deployment strategies if your cloud provider supports them, such as AWS Aurora.
  • Consider database solutions that natively support zero downtime schema changes, like Vitess.

By implementing these strategies, you can effectively manage schema changes and minimize disruptions to your production database.

References

  1. MySQL Online DDL Overview
  2. MySQL Online DDL Operations
  3. MySQL Online DDL Performance
  4. MySQL Online DDL Limitations
  5. MySQL Metadata Locking
  6. MySQL Performance Schema Metadata Locks Table
  7. InnoDB Locks
  8. Percona Toolkit Online Schema Change
  9. GitHub’s Online Schema Transformation GH-OST
  10. Vitess

--

--