Eliminating MySQL replication lag for schema changes
At Chargebee, our primary database technology is MySQL, hosted on Amazon RDS instances. Each of our microservices have schemas isolated from each other. All the data is further sharded horizontally with each shard storing data for a set of merchant businesses. A “shard” consists of a Multi-AZ source and one or more replicas.
As evident in the diagram above, we use MySQL Replication to scale out our database reads, adding replicas as needed to serve requests. Currently, more than 70% of the read traffic is served by the replicas. Further, updating indexes during writes involves additional overhead, so the number of indexes on the source are kept at a minimum. Although all indexes present in the source are also present in the replicas, some indexes are exclusively for the replicas and not for the source. These are typically indexes for fields that attract heavy read traffic such as custom fields.
Large replication lags¹ make the replicas unusable for our services and increases load on the source. Consequently, ensuring that replication lag stays under acceptable limits is a critical SRE requirement.
ALTER TABLE for large tables
Our MySQL databases are such that while the source allows parallel transactions in many cases, all the transactions on the replicas are serialized.
Consider the following transactions, including an ALTER TABLE
, executing on the source:
The same transactions are executed in the replicas one after another, in order of their commit time on the source:
As indicated in the diagrams above, due to serialization of transactions on the replicas, Txn-3, which was committed a few milliseconds after 05:00 hrs in the source gets committed in the replica only after 05.20 hrs.
With this delay of more than twenty minutes, the data in the replica is already stale for most practical purposes. Even for the most forgiving of read operations, such as the export API, we expect replication lag to be not more than a few seconds. Moreover, since the same ALTER TABLE
is executed on all replicas, every one of them will have a lag well beyond acceptable limits. This would overburden the source with read requests during the ALTER TABLE
replication phase.
Solution: ALTER TABLE
on replica(s) first
To eliminate this replication lag, we decided to execute ALTER TABLE
statements on the replicas before doing so on the source. This can be safely done for all backward compatible² ALTER TABLE
statements.
However, when the statement is executed again on the replica as part of the normal replication flow, it will fail. This is because MySQL does not have a DDL option such as add column if exists
.
We considered a couple of approaches to mitigate this replication failure:
Approach 1: Skip binary logging
We first investigated if we could try disabling the binary log for the ALTER TABLE
statement alone on the source, thereby preventing replication of such statements and as a result, circumventing replication failure. This can be done by disabling the binary log for a particular session using the following statement before executing the ALTER TABLE
scripts:
SET session sql_log_bin = OFF;
However, since Amazon RDS relies on the binary log for its automated backups/PITR, disabling the trusty binary log was not an option.
Approach 2: Watch for replication failure and skip over it
Once the first approach was discarded, we thought of the next best thing: what if we could monitor for replication failure right after an ALTER TABLE
commit happens on the source? As soon as we see the failure happen, we could skip over it and resume replication from the next statement!
Since schema alteration can be executed using MySQL’s online DDL option, we keep the replicas in use, serving the requests, while running the ALTER TABLE
. Once the statement is executed on all replicas we execute theALTER TABLE
on the source. As soon as the statement is committed, we start polling replication status using SHOW SLAVE STATUS
to catch the inevitable replication failure. On catching sight of the error, we run the following statement to skip over the failure and start executing the next statement in the replication queue:
CALL mysql.rds_skip_repl_error;
As seen in the diagram below, the ALTER TABLE
replication failure is detected almost immediately and skipped over, allowing Txn-3 to start early and commit by 5:00:05 hrs instead of having to wait for twenty minutes.
This way, we are able to successfully execute backward compatible MySQL schema changes² without any impact on our services.
We hope you found this article useful.
Notes
[1]: Replication lag, at any given moment for a replica is the difference between the current timestamp at the replica, and the timestamp logged on source for the transaction that is currently underway at the replica. For fast networks, this is measured in seconds by the Seconds_Behind_Master
field returned by the SHOW SLAVE STATUS
statement.
[2]: Some caveats to the solution proposed:
- If the
ALTER TABLE
is not backward compatible with the current schema (such as when dropping a column, changing the data type of a column etc.), then this approach would fail. - Also, if the
ALTER TABLE
involves non-deterministic columns such as timestamp columns with default values, then using this approach is not recommended as the values for those columns between source and replica(s) won’t be consistent.
Technical Review & Edits
- John Machan, Senior Technical Writer
- Vignesh VB., Principal Engineer