Tales of Replication — Single Leader Replication

Uday BK
6 min readAug 24, 2024

--

Replication is copying data into multiple machines via network.

Replication has become so common that it’s now considered almost a default choice. But, it has lot of details that are hidden. In this and coming series of articles, we shall see some of the details in the process of replication.

There are some reasons for replicating the data.

  1. To make the system available.
    Replication helps our system to have the choice of promoting a follower to be a leader (because replica instance would have all the data), if leader is down.
  2. To tackle the scalability issues.
    If the system is read intensive, firing read queries on leader could choke leader’s resources. Replication helps us to differentiate the reads and writes. System writes to leader and reads from follower.
  3. To place a server geographically closer to user’s network.
    For example, If the user in India requests an Instagram post/comment of Robert Downey jr, it takes request to travel from US to India via multiple network hops, which increases the latency of the request. Instead, if the data is replicated and stored in India, Post/comment could be retrieved faster.

In this article, we will discuss the process of replication, synchronous and asynchronous replication, failovers of followers and leader.

This article mainly focuses on Single leader replication. Multi Leader and Leader less replications shall be discussed in upcoming articles.

leader: A leader is a replica that accepts writes and propagates the data for replication

replica/follower: A replica is a server that accepts data from leader and stores copy of
the data. Replica’s don’t accept writes.

Process of Replication

In a system that has a leader and followers, there can be many followers and a single leader. Data first gets written in the leader and then propagates to all the follower machines.

How does the system make sure that the data that is written in leader is certainly written in followers ?

Let’s look into the methods that we have.

  1. Synchronous Replication
  2. Semi-Synchronous Replication
  3. Asynchronous Replication

Synchronous Replication

In synchronous replication, a leader writes the data to it’s own storage and also sends the changes to it’s followers. Leader waits till all the followers writes the data. This ensures that leader and followers would have the same data, thus ensuring consistency. But, this method comes with challenges

  1. Increased latency
    -
    Leader waits till all the followers confirm that they have written the data. So, client that had issued a write has to wait for total time of Leader (t1) + slowest performing follower (t2). This slows down all the writes and updates. This also limits write throughput.
  2. Network latency
    -
    If the followers are spread out in different zones of the globe, network layer adds it’s own latency for overall transaction.
  3. Risk of Availability
    -
    Since Leader should always be in sync with all the followers, if any of the follower goes down, system becomes unavailable. If there are n nodes in the system, now the risk of availability increases as we add the follower nodes.

Semi Synchronous Replication

In semi-synchronous replication, Out of n followers, at least 1 (can be more than 1) is chosen to be in sync with leader, with synchronous replication. Other followers would be issued a write but confirmation is not awaited. This ensures that, if the leader goes down, system has 1 follower that has up-to-date information as leader and could be promoted without any issue.

This method also borrows similar challenges of synchronous replication.

Asynchronous Replication

In asynchronous replication, leader doesn’t wait for confirmation of writes from followers. It means that , it’s not guaranteed to be durable. Assume that you have written a row in leader and queried on follower, it’s not guaranteed that follower would have the information. If , for reason, leader becomes un-responsive and the writes are yet to reach followers and if leader is not recoverable, then those writes are simply lost. We shall discuss leader failovers in coming seciton. However, there are advantages of asynchronous replications.

  1. Performance
    -
    Since leader doesn’t wait for the followers, system is decoupled and can scale independently. Leader is performant now, because , leader is fairly relaxed without having overhead of waiting for confirmations from followers.
  2. Scalability
    -
    As the system is now decoupled, followers can be added without any overhead. Adding followers will not burden the leader.

Most of the systems now run with Asynchronous Replication by default. It’s always a game of trade-offs. Most of the DBs that we use , by default employ asynchronous replication. Yes, Very popular MySQL(Leader based replication) also sets asynchronous replication by default.

We certainly expect consistency (at least in RDBMS). DBMS try various techniques to make the systems eventually consistent (out of scope for this article). Followers can be at lag, but they eventually catch-up.

References:
- https://dev.mysql.com/doc/refman/8.4/en/replication-semisync.html
-
https://dev.mysql.com/doc/refman/8.4/en/replication-gtids.html

Failovers

In a system of nodes that are communicated via network, possibility of a node going down is higher. Node can be the leader or the follower. We will see the different scenarios and outcomes for both leader or follower failover.

A failover/unresponsiveness is detected using heartbeats between servers. Heartbeats would be bounced between servers. If any of the servers is not responding to pings, then a buffer time is considered before declaring a failover. Buffer time should also be calculated accurately.

Reasons

- If the buffer time is more and node is really unresponsive and if the node is the leader, system would loose more writes.
- If the buffer time is less, then, unnecessary transition of roles between leader-follower could take place. This could also lead to some data loss.

So, buffer-time should be engineered accurately. This calculation is done by DBMS.

Follower Failover

If the failover occurs at a follower, usually the replication-lag between follower and leader increases.

Replication lag is the number of seconds , the follower is lagging behind the state of leader

Follower would have a offset , upto which a successful commit has been made. Once the follower is restarted/back with operations, follower reads the changes from leader using the offset.
During this time, when the replication-lag is high, system becomes in-consistent. Users who would have updated their profile might not see the updates, depending upon the node for which request is forwarded.

Once the replication-lag is cleared, system behaves consistently.

Leader Failover

Declaring that the leader is unresponsive is done cautiously. As discussed earlier, system avoids unnecessary transitions between followers and leaders.
Once it is certain that leader is unresponsive, a follower has to be promoted to the leader. A small election process happens where other nodes would elect a new leader. Usually, node with most updated data is declared the new leader. All nodes agreeing to one new leader is a consensus problem.

The new leader that is elected might not have all the writes (due to asynchronous replication). When the older leader restarts and joins the network, the older leader would have extra-writes than the new leader.

What happens to those writes? It depends. But, mostly, a manual checking is required. We will have to check if there is any discrepancy between old leader and newer. If yes, copy the data manually.

The ID Problem is also important to keep in mind. If the auto-increment ids are missed between older leader and newer leader, then replication can break.

For example,

old leader had 1,2,3,4,5 ids. and new leader has 1,2,3,4. When a insert request comes, new leader would insert with record with id 5. This record id with 5 is then propagated to follower (old leader). But, old leader already has a record with 5. So, there is a error in replication and replication stops with error.

End

It is fascinating that how many things are being handled by the DBMS and they almost work error-free and does the heavy-lifting job. Hats-off to all the engineers.

That’s it for now. I have learnt a lot , especially from this book : Designing data intensive applications by Martin Kleppmann. Do check this out.

Thanks For Reading.

--

--