Designing Data Intensive Applications: Consistency with Linearizability

Adrian Booth
The Startup
Published in
6 min readJun 14, 2020

In my previous articles where I summarised my findings of Martin Kleppmann’s book, Designing Data Intensive Applications, I guided readers through database transactions and the different isolation levels that we can achieve through those transactions.

A database transaction is nothing more than an abstraction; a way for applications to pretend that certain faults do not exist (think network failures, disk corruptions, race conditions). Because of the existence of transactions, we can greatly simplify the code that we write because we can treat several database write operations as a single logical unit of execution. Without them, the complexity involved in coding around partial failures would be unmanageable.

Before continuing this article, you should have at least a basic understanding of database replication, otherwise most of this will not make sense.

I’d like to move on now to a different abstraction that is commonly utilised in distributed systems; linearizability.

Linearizability is an abstraction that allows the application to pretend that there’s only one view of the data. In reality, “the data” will be replicated, partitioned, normalised and go through all sorts of transformations behind the scenes. To be “linearizable” is to hide away all of the complications associated with data replication from the application.

Replication lag is a common problem in distributed systems, and this problem exists because our system cannot be guaranteed to be stable and resilient when we keep all of our data on a single machine. We have to factor in problems such as machine faults and how it would behave under heavy load. We replicate our data across machines to get around one problem, but that then introduces yet another problem; replication lag. The joy of distributed systems hey.

Say you have three database machines that an application can execute queries on. One deals with writes, whilst the other two deal with reads. Replication lag occurs when we write to one database, it replicates the data to Follower 1 (see image below), but by the time we read from Follower 2 the replication hasn’t completed. This breaks the assumption the application has that there is a single view of the data

Most database systems will operate under an eventual consistency model. This basically says that if you stop writing to the database and just wait for an unspecified length of time, the data on all replicas will eventually be up to date. In other words, “Wait and refresh and you’ll see what you expect to see after a short while”. In his book, Kleppmann gives this another name: Convergence, given that all replicas will eventually converge to the same value

Eventual consistency can be tricky for application developers because subtle bugs can occur that are very difficult to test in development environments. They usually occur when there’s additional network latency or high concurrency, which are typically issues that you’ll only see in a production environment. In an eventually consistent system, it’s up to the application developer to decide whether temporary inconsistency is a price worth paying (for better performance, perhaps), and if not, what workarounds can be achieved.

Now that we’ve understood the problems typically encountered with database replication, let’s discuss one of the strongest consistency models; linearizability.

The idea behind linearizability (also known as strong consistency, atomic consistency, immediate consistency or external consistency) is that if a client asks two different database replicas for the same data at the same time, then they should get the same result. In an eventually consistent system, this is no guarantee and can often confuse users.

When a client operates in a world of strong consistency, the problems of replication lag disappear therefore making application code much simpler as it’s no longer a problem each client has to deal with.

Here’s an example of a system that violates linearizability, and operates in a world of eventual consistency. Read it closely before carrying on.

Notice that Alice tells Bob that she’s seen the final result of the game. Once Bob hears this, he then hits the same site but receives a different result. It would be no surprise if this happened if they hit the site at the exact same moment, but because Bob visits the site right after Alice, he would expect to see the same result. The truth is that when he made his request, this went to a different replica that hadn’t yet “caught up”.

You can think of linearizability as a recency guarantee; an agreement between the database and the application that when any clients query the database, they won’t encounter a situation like the one above. As far as the application is concerned, there is and always will be just one source of data (even though we replicate data heavily for fault tolerance and performance reasons).

The question you probably have in your mind right now is, “If linearizability leads to simpler assumptions from the application and less user confusion, why doesn’t everyone implement it?”. Like every architectural decision in software, there are trade offs.

You may have already heard of the CAP Theorem, and if you haven’t then don’t worry. It dates back to 1998 and states that it’s impossible for a distributed data system to have all 3 of the following:

  • Consistency — Every read will receive the most recent write (Note, in traditional CAP Theorem this is referring to strong consistency i.e linearizability)
  • Availability — Every request will receive a response
  • Partition Tolerance — The system as a whole continues to function in spite of network failures

Kleppmann makes it clear that the choice really resides in the first two, as network partitions are a kind of fault and not something you really have a choice over. There’s a lot of controversy surrounding CAP and what the terms mean, so we won’t dwell on it in this article. There’s plenty of resources online that you can indulge in.

The “cost” of having a linearizable system is that you often have to sacrifice Availability in order to have it. Let’s see an example taken from Kleppman

Above is an example of a multi-leader replication setup, so we have two databases that can accept write requests and a set of clients that are routed to only one of the data centres. Whenever one of these databases processes a write, it will replicate the data to the other one, thereby ensuring our linearizable guarantee.

Now think what happens when we have a network interruption between these two data centres. They can both continue processing requests from users, it’s just the replication connection that is now broken. Under a multi-leader setup (two databases accepting writes), each data centre continues operating as normal; the writes are just queued up and exchanged when the network fault is over.

However when we have a single-leader replication setup (only one database accepting writes), the leader needs to be in one of these data centres. This means some clients will be making requests to the “follower” data centre which relies on the replication connection to sync the data from the leader. In the event this connection goes down, the follower data centre goes down with it. Remember that this is only within a system that insists on strong consistency / linearizability. We cannot allow a client to read from a database that might have inconsistent, and therefore wrong, data. So if the connection between the two is broken, we must block all requests from the follower data centre because the data is not up to date with the main leader. If we did not have a strict linearizable guarantee, we could continue allowing the follower to operate knowing that eventually the data will be synched up once the replication connection is fixed.

I hope this has clarified more about the subject of consistency and where it might go wrong for us. As with every decision we make when architecting applications, there will be trade offs. Most services are fine with eventual consistency guarantees, but the rule of thumb is if you’re dealing with financial data then always reach for linearizability.

--

--