Data Replication in distributed systems (Part-2)

8 min readMay 18, 2020

In Part-1 of this series we explored aspects of data replication in single leader distributed systems. In this part (Part-2) we will deep dive into multi-leader data replication. In Part-3 we will cover leaderless data replication

Introduction

Single leader based data replication provides read scalability, reduced latency (by keeping data to read closer to user) and better availability as compared to no replication, yet it has one pitfall : What if the leader is not available to you due to network interruption ? This flaw paves way for multi-leader and leaderless replication

In multi-leader replication more than one node to accept writes. Process of replication is still the same. Instead of followers getting updates from a single leader, each node forwards write to each other in multi-leader replication (active/active replication or master-master replication). Thus each node act as a leader as well as a follower to other leader nodes.

Multi-Leader Replication

This approach is usually deployed in multi-datacenter operations(data replication to difference data centres to tackle datacenter outages), offline clients operations(Apps like google calendar), and during collaborative editing (like Google Docs)

Multi-leader replication across multiple datacenter

In this layout typically each datacenter has one leader and each leader in one datacenter copies(replicates) data to other leaders residing across data-centres. Following is the comparison between a single leader replication vs above configuration in multi-leader replication :

Latency : Any writes over the internet always goes to the lone leader in single leader replication. This can add notable latency to writes especially when clients and server are far apart in different continents. In multi-leader layout each continent/country/area can have a leader in local datacenter which can cater to writes, and then replicate the data to other datacenters, thus significantly reducing latency as compared to single leader replication
Availability : Multi-leader replication can smoothy handle datacenter outages. In case of failure of a datacenter, another datacenter can cater for requests till the time failed datacenter comes back online and then replication ensures it catches up with other leaders lying in different datacenters
Network Interruption : Requests moving across continents through internet are less reliable than those moving through local network. Any interruption in network link leading to leader node in single-leader replication can bring the whole data system to standstill. This issues is much better handled with multi-leader configuration as requests can be temporarily routed to other datacenters

Tools like GoldenGate(for Oracle), Tungsten Replicator(for Mysql), BDR(for PostgreSQL) are employed for multi-leader configurations.

Client offline operations

Another area where multi-leader replication is used is in calendar apps. Each instance of app on your mobile, desktop, laptop or any other device act as a local database (leader). Changes made on these apps(local leaders)are synced with the server (through a multi-leader replication process) and your other devices when you come online. There could be replication lag of hours or day if changes made on one of those devices if that device is offline for a longer period.

Editing collaboratively

Google docs is one another example of multi-leader replication. When one user edit a document, changes are applied to their local replica which is then asynchronously replicated to the server and then to other users who might be editing the document concurrently. To avoid editing conflicts usually these apps locks the document for a very short duration (like a single keystroke)

Issues with multi-leader replication

Multi-leader replication is not a silver bullet of replication. It has its own set of problems.

Data consistency issue : Since there are many leaders, any concurrent modification for same data in different datacenter can lead to conflicting writes. There are many techniques(like last write wins) to handle such writing conflict event but that leads to data loss. Other techniques involves prompting user to resolve conflicts either during data read or write. We will discuss all these logics in next section, “Resolving Write Conflicts”
Issues due to inbuilt functionalities of a database : If a write query involves triggers, random function, auto-incrementing keys and data integrity constraints, then replication of this kind of data will produce unpredictable or different results on different leaders of datacenters. For e.g. Suppose there are two datacenters ‘A’ and ‘B’ and both datacenters has a leader database with current auto-incrementing counter as 100. If leaders on ‘A’ and ‘B’ receives two different writes concurrently, then both will have different data at counter 101. Similar kind of issues can arise with triggers.

Resolving Write Conflicts

Major issue with multi-leader replication is write conflict. For example, customer 1 and customer 2 are debiting amount 200 and 100 from the account ‘xyz’ concurrently. Request for transaction 1 lands on leader A and transaction 2 falls on leader B. Both allow the transaction to proceed. However when changes are asynchronously replicated a conflict is detected. This situation doesn’t arise in single leader replication.

Fig 2. Write conflict in multi leader replication

Conflict resolution

Best way of handling a conflict is to avoid it altogether. In case this is not feasible, try converging the conflicts. Sometimes you also need manual intervention by customer to resolve it.

Avoiding conflicts : Above issue could have been avoided if transaction corresponding to particular account always land on the same leader. This can be done by taking hash of account and then route all requests for a particular account to same datacenter node. However, this logic fails when the leader where request for this account was landing earlier fails for some reasons. In that case user request is routed to some other datacenter and conflict avoidance logic breaks down
Converging data to some consistent state : In case data against write is not a critical data(like financial data, account details etc.) then you have following ways to converge conflicts writes :

a. Last Write Wins : If we can have a timestamp field in the data to written, then we can choose the latest timestamp as the winner and drop the other ones. This technique is known as LWW(last write wins). This logic results in data loss

b. Replica/Leader precedence : We can assign each leader with unique id and some precedence. In case of conflict we can choose winner as one with highest precedent. This logic also results in data loss

c. Merge two writes : We can think of merging writes together (e.g. if writes are for an e-commerce cart and both writes corresponds to items addition to cart, then we can think of merging these two writes, thus pushing both items from two writes into a single list)

d. Record and keep conflicting writes to somewhere else : All conflicting writes can be preserved to some other data structure and then resolve it later on(Either by manual intervention of user or by using some custom logic of conflict resolution)

3. Custom conflict resolution :

a. During write : Whenever system detects a write conflict it can call some customer conflict handler. This write handler can then resolve conflict with some predefined logic as discussed in point number 2.

b. During reads : We can prompt user to choose or modify current conflict whenever next read is done. This is something similar to Git. CouchDB employs this logic

4. Automatic conflict Resolution techniques :

a. Operational transformation : This algorithm is used to resolve conflicts in Google Docs and Etherpad

b. Conflict-Free Replicated Datatypes (CRDTs) : CRDTs are objects(like sets, maps, ordered lists etc.) that can be updated without expensive synchronisation/consensus and they are guaranteed to converge eventually if all concurrent updates are commutative and if all updates are executed by each replica eventually

Replication topologies in Multi-Leader

Replication topology is the configuration of leaders through which data propagates between them. With only two leaders topology is simple and there is only one way of data propagation. With more than two nodes, following topologies are possible :

Circular Topology : Data moves from one node to another in circular manner. Infinites loops are prevented by providing identifier to each node and when a data which was changed earlier by same node lands there, it is ignored. One issue with circular topology is that if one node fails, then communication between all nodes are halted. MySQL supports circular topology by default.

Star Topology : In star topology data moves through many nodes before it is replicated everywhere. Logic used in circular topology is also used here to prevent infinite loops. Star topology also has single point of failure issues as that of circular topology

3. All To All Topology : In this topology each leader sends data to all other leader in the system. This topology is better than circular and star topology as there is no single point of failure. However, this topology has other issues like some network links being faster than other, thus can lead to error in case one dependent write reaches late on some other leader. For e.g. If let’s say a user creates some document for which write lands on leader 1. Then user updates the document for which write lands on leader 2. In case network link is faster than between leader 2 and leader 3, then there is a possibility that update that is done on leader 2, reaches leader 3 faster than the insertion done on leader 1. This is an issue of casualty and can happen in all-to-all topology

In the third Part (Part-3)of this series, we will leaderless replication. Read Part one (Part-1) for single leader replication.

Don’t forget to clap if you enjoyed reading this article!

References :

Multi-master replication

Multi-master replication is a method of database replication which allows data to be stored by a group of computers…

en.wikipedia.org

advantages of single-leader (transactions) over multi-leader replication

I am reading the excellent book "Designing Data-Intensive Applications" which I wholeheartedly recommend, but I'm…

dba.stackexchange.com

Paxos Replication vs. Leader-Follower Replication

If you are operating some stateful services, chances are you have to replicate your data. There are use cases of single…

blog.the-pans.com