Replication and Linearizability in Distributed Systems

Chiranjeeb Buragohain
The Startup
Published in
13 min readJun 13, 2020

--

All distributed storage systems rely upon replication for durability and availability; but every system seems to have their own approach to replication: quorum replication, chain replication, primary/backup, Paxos/Raft and many others. This post is an attempt to understand the landscape of these algorithms through the lens of generating a unique sequence number. It was prompted by a question from a colleague who asked: how is it possible that Amazon Aurora can use simple quorum replication to achieve strong consistency? Doesn’t quorum replication provide only eventual consistency (e.g. Cassandra)? In this post I will start by describing simple quorum replication and show how it its shortcomings can be remedied by using sequence numbers and finally how sequence numbers are themselves generated in different systems to achieve the same final results.

None of the content in this post is novel: this post is just my attempt at understanding replication algorithms better.

Quorums and Their Troubles

The core purpose of replication is to increase availability and durability of data in the presence of server failure. So let’s assume that we have a set of servers among which we want to replicate data and we are willing to tolerate up to f server failures. We have a client that knows these servers and communicates with them using some reliable communication protocol. The set of servers together are known as the configuration and the process of changing the set…

--

--