CAP Theorem

Impossibility proofs under different network models

Aditya Shete
5 min readMay 31, 2022

Introduction

A distributed system, especially web services, has certain requirements motivated by business considerations, legal or regulatory purview and myriad other concerns. A web service could be demanded to be available at all times, always to service user requests. It could be required that any transaction, be it from multiple nodes, result in consistent transactions. Further, under no condition, network outages or node failures should the service go down.

One wonders, how can we be sure that we can even guarantee such requirements in the first place?

Enter impossibility proofs, which state regardless of physical optimization or devising the cleverest algorithm possible, we cannot achieve specific properties in a system. This forces designers to understand the compromises and better understand their use cases.

What use is consistency if that data is never used again? What use is availability if there will only ever be user requests at a given time? Impossibility results in avoiding potentially unrealistic expectations from systems and better-designed solutions geared towards the problem space.

Let us understand the CAP Impossibility Theorem then:

Elements of the proof

Let us consider a network shared data structure D; it could be something as simple as a shared event counter. We can demand the following from an algorithm A on a distributed system:

  • Consistency: We consider algorithm A to be consistent if there is a total order of operations on D. Consistency can be understood as the distributed system acting as a single node, with each operation being atomic. If a read is executed after a write, then a consistent algorithm would return the latest write value globally.
  • Availability: We consider algorithm A to be available if there is a response given to each request from the user on a working node.
  • Partition Tolerance: We consider algorithm A partition tolerant if there is a response to all users’ requests under partial network failure. A network failure is modelled as the loss of messages from one partition to another.

(Note we cannot demand tolerance on total network failure or proper requests from a failing/failed node as those are impossible anyways)

Now that we have described the demands on algorithm A, we can move on to consider the environment in which it functions.

Asynchronous Network

An asynchronous environment is modelled by a distributed system in which:

Nodes do not have clocks. This implies time bounds on messages and heartbeats cannot be used to determine whether a node has failed or is simply processing at a leisurely pace. We cannot determine if the message has been successfully delivered or lost due to a network partition. Hence, nodes can only make decisions based on the messages received from the other nodes.

CAP Impossibility Theorem in asynchronous environments:

It is impossible to have an algorithm A such that it guarantees the following:

  1. Availability
  2. Consistency

Under all executions of the algorithm, including those under a partitioned network.

Proof:

The proof is straightforward; we assume an execution as follows in a partitioned network with partitions N, M.

  1. A user requests a write request on the data structure D on a node in partition N.
  2. Subsequently, a read request is executed on a node in partition M.

Availability constraints that all requests be given a response; hence we will have a response for the read request from M. The response cannot satisfy consistency during a network partition because there are no messages from partition N to M or vice versa. Hence the response cannot contain the latest write on D.

A stronger impossibility result:

It is impossible to have an algorithm A such that it guarantees the following:

  1. Availability in all executions, including those under a partitioned network.
  2. Consistency in all executions, in only those where all messages are delivered.

Proof:

Even if we could stop the network partition, it would be impossible to guarantee consistency. The proof depends on the nature of the asynchronous network model. The idea of the proof is an extension of the above proof:

  1. A user requests a write request on the data structure D on some node in the network.
  2. The node broadcasts the new value of D to all nodes in the network.
  3. A user subsequently requests a read on D on a different node in the network.

In such a scenario, a message might still be in transit or still being processed at some node that we cannot guarantee a consistent algorithm.

The central problem in the asynchronous network model is that it has minimal assumptions, i.e. we cannot synchronize events across nodes. If an algorithm works under these assumptions, there are minimal things it can do to know the system’s state. Under a partitioned network, an algorithm will have to choose between continuing accepting writes (available) or stopping writes until the network heals (consistent).

Partially Synchronous Model (PSM)

A partially synchronous system has an additional attribute to the asynchronous model; the nodes have clocks with the same tick speed. Different nodes will show different times at the same instant, but the rate at which they increase time is the same. This has particular implications for our algorithm; it now can use timings and heartbeats. A message on the network over a certain time-bound can be determined as lost.

CAP Impossibility Theorem in PSM:

It is impossible to have an algorithm A such that it guarantees the following:

  1. Availability
  2. Consistency

Under all executions of the algorithm, including those under a partitioned network.

Proof:

We can construct an execution as follows in a partially synchronous model under a network partition with partitions M, N:

  1. A user requests a write request on data structure D at a node in partition N.
  2. The system then receives no requests for a time interval greater than the time-bound for a message in the network.
  3. A user requests a read request on D in a node in partition M.

The read response cannot be consistent with the data structure in N as there are no messages that can travel from M to N and vice versa. If a node then decides that writes cannot happen, then it has to give up availability to maintain consistency.

The stronger impossibility result is not valid in a partially synchronous model; if we can guarantee that all message are delivered, we can have consistency and availability simultaneously. Hence we can see that certain models become available in PSM that are impossible under an async environment.

Weakened Requirements

Another way we can circumvent these theorems is to weaken our notions of consistency or availability. Our concept of consistency requires that the system should deliver the latest global write on D; instead, we can make best-effort guarantees. This could be formalized as such:

  • We consider algorithm A to be weakly consistent if there exists a partial ordering of operations on D. This partial order is consistent with local executions on any node.

Such a requirement would return the most recent write in the partition, or if there are no such writes, the latest write at the local node. Any read and write on the partition is totally ordered but writes across partitions cannot be reliably ordered even in PSM.

A weakened notion of availability is already present, as our demands are only for a valid response from the system. We can stop certain operations such as write but still be available for any read request on D. Such minimal availability guarantees make more sense than having the whole system available even under network outages.

There are merits to all system models; the idea is to build according to need and understand the necessary sacrifices in achieving those needs. A DB cannot be available under network partitions because its priority is consistent data; similarly, other systems can prioritize their availability requirements.

References:

[1] Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web service

--

--