Eliminating MySQL replication lag for schema changes

Adithya K S
Engineering @ Chargebee
5 min readNov 25, 2020

At Chargebee, our primary database technology is MySQL, hosted on Amazon RDS instances. Each of our microservices have schemas isolated from each other. All the data is further sharded horizontally with each shard storing data for a set of merchant businesses. A “shard” consists of a Multi-AZ source and one or more replicas.

As evident in the diagram above, we use MySQL Replication to scale out our database reads, adding replicas as needed to serve requests. Currently, more than 70% of the read traffic is served by the replicas. Further, updating indexes during writes involves additional overhead, so the number of indexes on the source are kept at a minimum. Although all indexes present in the source are also present in the replicas, some indexes are exclusively for the replicas and not for the source. These are typically indexes for fields that attract heavy read traffic such as custom fields.

Large replication lags¹ make the replicas unusable for our services and increases load on the source. Consequently, ensuring that replication lag stays under acceptable limits is a critical SRE requirement.

ALTER TABLE for large tables

Our MySQL databases are such that while the source allows parallel transactions in many cases, all the transactions on the replicas are serialized.

Consider the following transactions, including an ALTER TABLE, executing on the source:

A timing diagram showing several transactions happening simultaneously on the source database.
Transactions running in parallel on the source.

The same transactions are executed in the replicas one after another, in order of their commit time on the source:

A timing diagram showing transactions happening serially on a replica database.
Serialized transactions on a replica

As indicated in the diagrams above, due to serialization of transactions on the replicas, Txn-3, which was committed a few milliseconds after 05:00 hrs in the source gets committed in the replica only after 05.20 hrs.

With this delay of more than twenty minutes, the data in the replica is already stale for most practical purposes. Even for the most forgiving of read operations, such as the export API, we expect replication lag to be not more than a few seconds. Moreover, since the same ALTER TABLE is executed on all replicas, every one of them will have a lag well beyond acceptable limits. This would overburden the source with read requests during the ALTER TABLE replication phase.

Solution: ALTER TABLE on replica(s) first

To eliminate this replication lag, we decided to execute ALTER TABLE statements on the replicas before doing so on the source. This can be safely done for all backward compatible² ALTER TABLE statements.

However, when the statement is executed again on the replica as part of the normal replication flow, it will fail. This is because MySQL does not have a DDL option such as add column if exists.

We considered a couple of approaches to mitigate this replication failure:

Approach 1: Skip binary logging

We first investigated if we could try disabling the binary log for the ALTER TABLE statement alone on the source, thereby preventing replication of such statements and as a result, circumventing replication failure. This can be done by disabling the binary log for a particular session using the following statement before executing the ALTER TABLE scripts:

SET session sql_log_bin = OFF;

However, since Amazon RDS relies on the binary log for its automated backups/PITR, disabling the trusty binary log was not an option.

Approach 2: Watch for replication failure and skip over it

Once the first approach was discarded, we thought of the next best thing: what if we could monitor for replication failure right after an ALTER TABLE commit happens on the source? As soon as we see the failure happen, we could skip over it and resume replication from the next statement!

Since schema alteration can be executed using MySQL’s online DDL option, we keep the replicas in use, serving the requests, while running the ALTER TABLE. Once the statement is executed on all replicas we execute theALTER TABLE on the source. As soon as the statement is committed, we start polling replication status using SHOW SLAVE STATUSto catch the inevitable replication failure. On catching sight of the error, we run the following statement to skip over the failure and start executing the next statement in the replication queue:

CALL mysql.rds_skip_repl_error;

As seen in the diagram below, the ALTER TABLE replication failure is detected almost immediately and skipped over, allowing Txn-3 to start early and commit by 5:00:05 hrs instead of having to wait for twenty minutes.

A timing diagram showing how ALTER TABLE replication failures are ignored.
Serialized transactions on a replica with the ALTER TABLE failure skipped over.

This way, we are able to successfully execute backward compatible MySQL schema changes² without any impact on our services.

We hope you found this article useful.

Notes

[1]: Replication lag, at any given moment for a replica is the difference between the current timestamp at the replica, and the timestamp logged on source for the transaction that is currently underway at the replica. For fast networks, this is measured in seconds by the Seconds_Behind_Master field returned by the SHOW SLAVE STATUSstatement.

[2]: Some caveats to the solution proposed:

  • If the ALTER TABLE is not backward compatible with the current schema (such as when dropping a column, changing the data type of a column etc.), then this approach would fail.
  • Also, if the ALTER TABLE involves non-deterministic columns such as timestamp columns with default values, then using this approach is not recommended as the values for those columns between source and replica(s) won’t be consistent.

Technical Review & Edits

--

--