How to Fix a Lagging MySQL Replication
Fred de Villamil 🤔
463

Nice post, a few notes:

> Setting innodb_flush_log_at_trx_commit to 2 makes the flush happen every second (depending on the system load). This means that, in case of crash, innodb will have to replay all the non committed transactions (up you one second here).

Depending on the type of crash (MySQL only or complete OS crash) you may _lose_ up to one second’s worth of transactions. This isn’t about replaying them; the data would be lost.

Data not in the transaction log is guaranteed to not appear in your tables. That is how the premise of ACID is implemented. What you’d get upon recovery is a valid, consistent snapshot of your data, but a stale one.

Replicas of your crashed server may actually have that lost data, as they have the potential of retrieving it, or part of it, during that second where the transaction log is unflushed and just before the crash.

Setting `sync_binlog=0` means any replicas dangling from your crashed server are likely to not be able to resume replication. There’s workarounds to that so less of an issue.

The time it may take to recover a crashed master can range anywhere between seconds and hours, depending on your load and the reason for crash. Typical recovery times assuming hardware is intact range 10–30 minutes on busy installations. This is typically more than can be afforded. If the crashed server is the master/writable, most solutions involve failing over to a replica or using a Galera setup. If the server is a replica, then it’s typically less of a concern, assuming other replicas are around to take its place.

Kind regards

Like what you read? Give Shlomi Noach a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.