MySql Replication masterclass — Part 1

5 min readDec 23, 2022

Introduction

In today’s world as they say “Data is the new oil” and websites around the world generate humongous amounts of data every minute and this data needs to be stored and retrieved efficiently. Presently the databases are doing a great job in allowing us to manage our data efficiently but anytime anything can go wrong because of network outages, server crashes and what not. This is in accordance with Murphy’s law where he states “Anything that can go wrong will go wrong”. We as engineers need to ensure that our data is always safe and available to our users.

At some point every company starts realizing the amount of data they have accumulated over time and the amount of pressure the database has to undergo when it comes to query performance and scalability. Taking regular database dumps is a way to safeguard data but not the optimal one since it’s very time consuming. This is where database replication comes into play.

Replication is a vast and one of the very important topics when it comes to Database Engineering. It allows us to have multiple copies of the same data in different servers in different geographies. It sounds very simple but a lot goes behind the scenes to make it possible. Let’s take a deep dive into it using Mysql as the database management technology and try to understand what replication is and what are its different flavors and formats provided by MySql.

So what exactly is Replication?

In simple terms, Replication enables data from one MySql database(also called Master/source/Primary DB) to be copied to one or more MySql databases(also known as replicas/slave DBs). Replication is asynchronous by default and the replica servers do not need a permanent connection to the primary DB. We can replicate all databases, selected databases or selected tables within the database.

Fig: Database replication

What makes Replication so important?

Adding read replicas to scale reads — We can add any number of read replicas to the primary DB and scale read requests. In this type of setup the write and read requests are segregated. The write requests are sent only to the primary DB and the read requests are distributed between the replicas. This also increases write performance as the primary DB is only handling writes and doesn’t care about any read connections.
Analytics — We can configure a read replica to perform analytics without affecting the performance on the primary DB. This setup allows us to have a dedicated database server to perform just analytical queries.
Disaster Recovery — A replica in some other geographical region paves a proper path to configure disaster recovery.

Replication Synchronization Types:

Fig: Replication Synchronization types

Synchronous Replication: In synchronous replication, a commit performed on the source should be committed to all the replicas before the control passes back to the session that performed the transaction. The advantage here is that in case of a failover all the replicas will have the updated copies of the data. The biggest drawback is slow processing and delay to complete the transaction.

Asynchronous Replication — This is the default synchronization method used by MySql. Here the primary or the master database accepts the transactions and sends back the acknowledgement to the session. The master DB does not care whether the transactions have been written on any replica or not. With asynchronous replication, if the server crashes the transaction that it might have committed may not have reached the replica and a failover in such a scenario may result in some missing data on the replica w.r.t the primary DB.

SemiSynchronous Replication: It falls between synchronous and asynchronous replication. In this case only one replica should acknowledge the event changes from the source DB. This setup guarantees that all the transactions that have been committed to the primary DB are available in at least one of the replicas. Compared to asynchronous replication, semi-synchronous replication provides improved data integrity, because when a commit returns successfully, it is known that the data exists in at least two places. Until a semisynchronous source receives acknowledgment from the required number of replicas, the transaction is on hold and not committed

Delayed Replication: We can deliberately lag the replica in a typical MySQL replication by the number of seconds desired by the use case. This type of replication safeguards from severe human errors of dropping or corrupting the data on the primary, for example, in the above diagram for Delayed Replication, if a DROP DATABASE is executed by mistake on the primary, we still have x seconds to recover the data from Replica2 as that command has not been replicated on Replica2 yet.

Replication Types

This is a very fundamental aspect when it comes to replication as this is what defines how the actual replication takes place. For the sake of simplicity I will try to list the different types and explain about them in a nutshell and in the next part we will go over these topics in detail. There are two methods of replication:

Binary log file position based replication
Global transaction Identifiers (GTIDs)

Binary log file position based Replication: Before we dive into this type of replication we need to understand the binary log files(a.k.a binlogs). Binlogs are log files present in the data directory that record all the events about database changes like schema changes or database updates and records them in a specific format. The binlogs do not log any SELECT statements.

In this type of replication all replicas keep the record of the binlog coordinates of the Primary DB — current binlog and the positions in them till the replica has read and processed. It is quite possible that at any given time two replicas are reading two different events in the same binlog.

Fig: Binlog based replication

Global Transaction Identifiers(GTIDs): This method is transactional and does not require working with binlogs or positions in them. In this case all transactions performed on the source DB get a unique identifier or GTID(a typical GTID looks like server-uuid: positive integer). When using GTIDs, each transaction can be identified and tracked as it is committed on the originating server and applied by any replicas.Because GTID-based replication is completely transaction-based, it is simple to determine whether sources and replicas are consistent; as long as all transactions committed on a source are also committed on a replica, consistency between the two is guaranteed.

Conclusion

In modern tech solutions replication plays a very crucial role in system design as it not only reduces the risk of data loss but it also improves the performance of the overall solution to a great extent.

We will cover the actual database replication using Binlogs as well as GTIDs in detail in our coming parts. Stay tuned.

Part 2 here!

References: