Update your MySQL Schema & a Million Rows Headache-free: Part 1
Updating a million rows or your database schema doesn’t have to be a stressful all-nighter for you and your team! Lucky for you, it is easy to execute these changes during office hours with zero downtime.
Here are some of the issues you may be facing:
- An
UPDATE
statement on a million rows will lock down the server for a long period of time - A code release that may require a schema update, at the same time, leading to a short delay if both events don’t happen simultaneously
I will show you what we do at AssoConnect, along with common use-cases (Doctrine Entities).
Part 2 will address the million-row update problem, while in this article I will focus on the following schema changes:
- Adding a new column to your table
- Dropping a legacy column from your table
- Renaming an existing column
Why does Changing a Schema take Time?
Different operations can change your schema:
- If you remove a column from a table, the code running on your server is not aware that the column has been dropped. In this case, you will get an error if this code still relies on the column to execute
SELECT
orINSERT
statements. - If you add a new column into the table, the code will continue to execute
INSERT
statements without a reference to this column. You will get errors if no default value for this column has been set in the table’s metadata. - Additionally, the renaming of a column can cause errors as it is the combination of adding & dropping a column.
Using the same logic, you’ll get errors when you release for production a new version of your code requiring a different database schema.
The last cause of delay regarding your database schema is locking. Write operations must wait for the schema update to be completed: you get an unacceptable delay when this process is too long.
Therefore, in order to prevent errors and delays, you must ensure that at any time:
- Your code will support both versions of the database schema (with and without the changes you need)
- A given database schema must be supported by different versions of your code
- All database schema updates must be as quick as possible
Being rigorous in these rules will reap several benefits:
- The developer producing new code does not have to take into account the database schema changes made by another colleague.
- You can run database migrations without waiting for the completion of a CI/CD process.
- You can run them during office-hours when your customers are actively using your service.
Prevent Locking when Updating the Schema
At AssoConnect, our code standard requires both clauses below in schema-changing statements:
ALGORITHM=INPLACE
LOCK=NONE
The first one prevents MySQL from reorganizing the data (which a fairly expensive operation) and the second, prevents MySQL from locking the table.
Without these clauses, MySQL will do its best to run the query with as little delay as possible. But depending on the MySQL server version, the operation you are running and the current table schema, MySQL may decide to use locking or another algorithm to complete your request.
So, we only use these clauses to prevent the query execution if it cannot be completed as expected. It is better to delay your work finding the cause than to create a delay in general.
The MySQL documentation shows you what you can and can’t do with in place DDL (Data Definition Language) operations. Be sure to read about the right version of your MySQL server!
Insider tip: use the InnoDB engine and the latest MySQL server version as they support the greatest number of use-cases you may have.
Another tip: the MySQL server executes very quickly metadata changes like the default value or a column name. But it may refuse to change the nullable status of a column with the previous clauses: so double-check your ALTER
statement before running it!
Adding a Column to your Table
Let’s go through the steps to complete your operation:
- First, add the column to your table with a default value
- Release an update of your code to fill this new column
- If necessary, run scripts to update old rows
- The next release of your code can now read from & write to this new column
I’ll now show you how it’s done with MySQL queries and a sample Doctrine entity.
- Add the column to your table with a default value
ALTER TABLE `user` ADD `lastname` varchar(50) COLLATE 'utf8mb4_general_ci' NOT NULL DEFAULT '';
<?php
/** @ORM\Entity */
class User { /** ORM\Column(type="string") */
private $firstname = ''; public function getFirstname(): string
{
return $this->firstname;
} public function setFirstname(string $firstname): self
{
$this->firstname = $firstname;
return $this;
}
}
The code above will execute this query when saving a new entry. The important part here is that there is still no reference to the new column.
INSERT INTO `user` (`firstname`) VALUES ('New user');
2. Release an update of your code to fill this new column
<?php
class User {
[...]
/** ORM\Column(type="string") */
private $lastname = ''; // Don't use me
public function getLastname(): string
{
return $this->lastname;
} public function setLastname(string $lastname): self
{
$this->lastname = $lastname;
return $this;
}
}
SQL queries will now use the new column:
INSERT INTO `user` (`firstname`, `lastname`) VALUES ('Another', 'user');
3. If necessary, run scripts to update old rows
4. Release a new version of your code reading from this new column
<?php
class User {
[…]
// You may now use this method
public function getFullname(): string
{
return $this->fullname;
}
[…]
}
Removing a Column from the Table
The steps are quite straightforward here:
1. Ensure the column to be removed has a default value
ALTER TABLE `user` CHANGE `firstname` `firstname` varchar(50) COLLATE ‘utf8mb4_general_ci’ NOT NULL DEFAULT ‘’ AFTER `id`;
2. Release a new version of your code that does not use the column: remove any usage of the related setter and getter, and do not forget occurrences in raw SQL queries or when you’re using the Doctrine Query Builder.
INSERT INTO `user` (`lastname`) VALUES (‘last user’);
3. Drop the column from your table
ALTER TABLE `user` DROP `firstname`;
Renaming a Column in your Table
This issue is the combination of three main steps: add a column with a new name, move your data, drop the column with the old name. All you have to do is follow the previous steps!
As you don’t plan to close your service for maintenance, your clients keep inserting new rows in your table so you end up with endless rows to manage in step two.
Our solution at AssoConnect is to follow the main steps according to the previous use cases:
- Add the column with the new name
- Release a new version of the code where the old setter calls the new setter
=> This step is critical: no more rows will be created without a value for the new column
3. Move existing data
4. Release a new version of the code using only the new column. This means changing all the usages of the old getter/setter to the new getter/setter. Don’t forget raw SQL queries and Doctrine query builder usages.
5. Drop the old column
Tip #1: Moving lots of data can be an issue, check out part 2 of this article
Tip #2: Don’t spend time replacing the usages of the old getter and setter in your codebase. PHPStorm has a built-in feature to rename a method: quite old but still valid.
Adding a Foreign Key Constraint
Sometimes the column you wish to add into your table has a relationship with another column. Even with NULL
as the default value for all the existing rows, the MySQL server can spend a lot of time creating the foreign key constraint as it goes through each existing record to ensure the value is either NULL
or exists in the referred table.
The trick here is to disable the foreign key checks before executing the ALTER
statement. This is simple:
SET FOREIGN_KEY_CHECKS=0;ALTER `my_table` ADD CONSTRAINT …
Usually, this only lasts for the current session of your connection to the MySQL server. You may want to run SET FOREIGN_KEY_CHECKS=1
to be 100% sure that foreign key checks are enforced again.
What’s Next?
Now you know how to manage schema changes without creating unnecessary delay!
In case the server refuses to execute your queries with the ALGORITHM=INPLACE
and LOCK=NONE
clauses, you’ll have to search the MySQL documentation to understand better why it isn’t working as planned, and find an alternative solution.
Happy querying!