MySql Replication masterclass — Part 3

10 min readJan 8, 2023

Introduction

In the previous articles we understood a lot about database replication basics, its synchronisation types and different replication formats. We also learnt about the binlogs in detail. We also introduced the different replication types Binlog position based replication and GTID replication. In this article we will try to understand these types in detail. We have already set the base of all pre requisites in our previous articles (part1 and part2).

Binlog position based replication:

Let us try to understand the binlog position based replication where the MySql server instance acting as the source or primary DB(this is where the database changes take place) writes the updates or changes as “events” to the binary log. All the replicas connected to this source receive these events and make appropriate changes on the replica DB based on these events.

Each replica has the information about the binary log coordinates; i.e, the binlog filename and the position within that file it has processed from the source. Its important to point here that the entire responsibility of maintaining the binlog coordinates is with the replicas and as such individual replicas can be connected and disconnected from the server without affecting the source’s operation.

Setting up Binlog Position Based Replication:

Let’s try to understand how to setup a MySql server to use binlog position based replication. There are different methods of setting up replication and the exact method to use depends on whether you already have data in the database on the source. We will setup replication for a database that already contains data and try to understand the steps involved.

The tasks to be followed are:

Setting the Replication Source Configuration
Setting the Replica configuration
Creating a user for Replication
Obtaining the Source Binlog coordinates
Creating the data snapshot for the source
Setting the source configuration on the replica

Setting the Replication Source Configuration

The first step towards this type of replication is to enable binary logging on the source. This can be done by either providing a value to log-bin variable during startup or by directly updating the MySql config file(my.cnf or my.ini).

Also each server with the replication topology must be configured with a unique non-zero server_id. This server_id is used to identify individual servers within the replication topology and must be a positive integer between 1 and (2³²)-1.

The default server_id is 1. It can be changed by issuing a statement like this:

SET GLOBAL server_id = 5

We also need to ensure that the skip_networking system variable is not enabled on the source as this controls the networking on the source. If this is enabled the replicas cannot communicate with the source and replication fails.

Setting the Replica Configuration

Each replica should be configured with a unique server_id that differs from that of the source and any other replicas. Note that a server_id value of 0(default in earlier versions of MySql) prevents a replica to connect to the source and must be updated with a new non-zero number for replication to work. Binary logging is not required to be enabled on the replicas. However it can prove to be beneficial for data backups and crash recovery. It can also be enabled for a more sophisticated replication topology where the replica can act as a source for another replica and thereby reduce connections on the source DB.

Creating a user for Replication

Each replica needs to connect to the source using credentials like username and password, so there must be some user account on the source that the replicas can use to connect. Any user account can be used provided it has been granted the REPLICATION_SLAVE privilege. We can create different accounts for each replica or use a common account for all the replicas.

Although it’s not mandatory to create a new account for replication specifically but its worth mentioning here that the replication username and password are stored in plain text in the replica’s connection metadata repository mysql.slave_master_info. Therefore it’s a better practice to always have a separate account that has privileges only for replication to minimize the possibility of compromise to other accounts.

We can create a new user as follows:

mysql > CREATE USER 'repl_user'@'host' IDENTIFIED by 'password';
mysql > GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'host';

Obtaining the Source Binlog coordinates

For the replicas to start the replication, we need to get the correct coordinates(for a fresh master DB its the start of the log file and for a DB with existing data its the position of the last update) within the binlog. To obtain these coordinates below steps can be followed:

Create a session on the source from the command line and execute the below command: This command first acquires the exclusive metadata locks for the tables and so it ensures that all transactions are complete. It then flushes the tables from the table cache and reopens them and acquires table locks. It also downgrades the metadata lock from exclusive to shared. After doing these changes other sessions can still read data but wont be able to modify the data.

mysql > FLUSH TABLES WITH READ LOCK;

Leave the above session open and in a different session on the source use the below command to get the details about binlog coordinates

mysql > SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000003 | 73       | test         | manual,mysql     |
+------------------+----------+--------------+------------------+

Here the File column shows the name of the binlog file and the Position column shows the position within the file. We need to record these values as these will allow us to set up the replicas later. These values represent the replication coordinates at which the replica should begin processing new updates from the source.

Creating the data snapshot for the source

For our example we need to create replication for the source that already contains data. We need to synchronise this data with the replicas before we can start the replicas to start reading events from the source. There are multiple ways to dump the data from the source database. The possible options are:

Using the mysqldump tool to create a dump of all the databases that are going to be a part of the replication process. This is the recommended method when using innodb
If the database is stored in binary portable files, we can copy the raw data files to the replica. This method skips the overhead of updating indexes as INSERT statements are replayed. This is not recommended for innodb engines.
Using MySql server’s Clone plugin to transfer the data from an existing replica to a clone.

Lets use the mysqldump utility to create a snapshot of the data below:

$> mysqldump --all-databases --master-data > dbdump.sql

The above statement dumps all databases to a file named dbdump.sql. The — master-data option automatically appends the REPLICATION SOURCE TO statement required on the replica to start the replication process.

Setting the source configuration on the replica

Since we are setting up replication with existing data, we need to transfer the snapshot created above from the source to replica before starting replication. Once the snapshot is copied we can import the dump file as below:

$> mysql < dbdump.sql

Let’s now configure the replica with the replication coordinates from the source. To do this we need to run the below command on the replica with the appropriate information as relevant to our source

mysql> CHANGE REPLICATION SOURCE TO
    ->     SOURCE_HOST='source_host_name',
    ->     SOURCE_USER='replication_user_name',
    ->     SOURCE_PASSWORD='replication_password',
    ->     SOURCE_LOG_FILE='recorded_log_file_name',
    ->     SOURCE_LOG_POS=recorded_log_position;

The CHANGE REPLICATION statement has other options as well. Details

This statement tells the replica DB which source to connect to and what user credentials to use. We also have to mention the log file name from where it needs to read events. The most important parameter is the SOURCE_LOG_POS which basically tells the replica to start reading from this position onwards. We got these details above when we ran SHOW MASTER STATUS command.

Start Replication and Check Status

Now that everything is configured, its time to start the replication on the replica via the below command

mysql> start replica;

To determine whether the replication is running successfully we can run the below command:

mysql> show replica status;

The output of the above command is as below:

mysql> show replica status\\G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for master to send event
                  Source_Host: <primary IP>
                  Source_User: repl_user
                  Source_Port: <primary port>
                Connect_Retry: 60
              Source_Log_File: mysql-bin.000001
          Read_Source_Log_Pos: 852
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 1067
        Relay_Source_Log_File: mysql-bin.000001
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 852
              Relay_Log_Space: 1283
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
        Seconds_Behind_Source: 0
Source_SSL_Verify_Server_Cert: No
  Replicate_Ignore_Server_Ids:
             Source_Server_Id: 1
                  Source_UUID: e17d0920-d00e-11eb-a3e6-000d3aa00f87
             Source_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Source_Retry_Count: 86400
           Retrieved_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3
            Executed_Gtid_Set: e17d0920-d00e-11eb-a3e6-000d3aa00f87:1-3
                Auto_Position: 1
        Get_Source_public_key: 0
            Network_Namespace:
1 row in set (0.00 sec)

GTID Based Replication

This type of replication is a straight forward approach to replication and does not need us to track the binlog positions at all times. In this case the master database maintains the transactions in its log and these transactions are copied and replayed on the replicas and on successful commit, it guarantees consistency among the master and all the replica servers. Here also we can use statement or row based format; however for best results its better to use row based format (explained here).

GTIDs are always preserved between the source and the replica and once a transaction with a given GTID is committed on a given server, any subsequent transaction with the same GTID is ignored by that server. This ensures that a transaction committed on a server cannot be applied multiple times which helps to guarantee consistency.

What are GTIDs?

A Global Transaction Identifier(GTID) is a unique identifier created and associated with each transaction committed on the server of origin(source DB). This id is unique not only on the source server on which it originated , but is unique across all servers in a given replication topology.

A GTID is represented as a pair of coordinates seperated by a colon(:) character as shown below:

GTID = source_id: transaction_id

The source_id identifies the originating server. The source’s server_uuid is used for this value. The transaction_id is a sequence number determined in order in which the transaction was committed on the source. For example, the first transaction to be committed on the server has 1 as its transaction_id and the tenth transaction to be committed on the same server is assigned a transaction_id of 10.

3E11FA47-71CA-11E1-9E33-C80AA9429562:23

The above example shows a transaction_id of 23 originating on the server with uuid 3E11FA47–71CA-11E1–9E33-C80AA9429562

The upper limit for sequence numbers for GTIDs on a server instance is the number of non-negative values for a signed 64-bit integer (2 to the power of 63 minus 1, or 9,223,372,036,854,775,807). In newer versions of MySql, a warning is issued when the server instance is approaching this limit.

Setting up GTID based Replication:

To setup replication using GTIDs some of the steps are similar to the binlog based replication as specified above. We will try to go through each step one by one:

Primary Host Configurations
Replica host configurations
Creating a replication user
Configure the replica to use GTID based auto positioning
Starting Replication and check status

Primary Host configurations:

The following config parameters should be present in the primary my.cnf file for setting up GTID based replication.

server-id - a unique ID for the mysql server
log-bin - the binlog location
binlog-format - ROW | STATEMENT (we will use ROW)
gtid-mode - ON
enforce-gtid-consistency - ON

Here the first three options are similar to what we enabled during binlog based replication as well.

gtid-mode: Enables GTID based logging on the server.

enforce-gtid-consistency: Depending on the value of this variable, the server enforces the GTID consistency by allowing execution of only statements that can be safely logged using a GTID. Setting this to ON means that no transaction is allowed to violate GTID consistency.

Replica Host Configurations

The following config parameters should be present in the replica my.cnf file for setting up replication.

server-id - different than the primary host
log-bin - (optional, if you want replica to log its own changes as well)
binlog-format - depends on the above
gtid-mode - ON
enforce-gtid-consistency - ON
log-slave-updates - ON

Here we have added an extra param log-slave-updates. Its an optional param that can be enabled only if we enable log_bin on the replica server. It allows the replica server to log the changes coming from the primary along with its own changes. It also allows us in setting up chain replication.

Creating a replication user

Since the replicas need a user to connect to the primary DB this step is exactly similar to what we did above. We need to create a replica user with only the replication slave privileges.

Configure the replica to use GTID based auto positioning

The replication setup must know about the primary host, the user and password to connect, the GTID auto-position parameter. The following command is used for setting up

CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = host,
  SOURCE_PORT = port,
  SOURCE_USER = user,
  SOURCE_PASSWORD = password,
  SOURCE_AUTO_POSITION = 1;

The SOURCE_AUTO_POSITION tells the replica that the source’s transactions are identified by GTIDs. In the binlog based replication above we used SOURCE_LOG_FILE and SOURCE_LOG_POS as its dependent on the log file name and the position within that file.

Starting Replication and check status

Now that everything is configured, we just need to start the replication on the replica via the following command:

start replica;

Again, if we want to test if the replication is running successfully, we can determine that by issuing the below command:

show replica status;

Conclusion

There are a lot of intricate optimizations that MySql takes care of internally and it handles a lot of concurrent operations by running specialized threads on both the source and replica DBs. We may cover these also in future.

This completes our three part series of MySql database replication masterclass. I would highly appreciate your feedback.