CAP theorem?

Gaurav Mutreja
3 min readMar 29, 2020

--

If you are into distributed systems, you will often hear about CAP theorem . So it is important to understand what it means. While understanding CAP theorem you will also be forced to think about the potential problems that can occur in a distributed system and how system responds to it and the trade offs it has to make.

CAP theorem states that while building a distributed system, you can satisfy only two out of the following three attributes.

  • Consistency (C)— A read is guaranteed to return the most recent write for a given client.
  • Availability (A)— A non-failing node will return a reasonable response to all requests within a reasonable amount of time (no error or timeout).
  • Partition Tolerance(P) — The system will continue to function when network partitions occur.

But the question is why only two ?

Distributed systems are supposed to be comprised of multiple nodes over the network so that they all can contribute to handle multiple requests.

Now let’s say you want to attain Partition Tolerance(P) which means nodes cannot communicate with each other and system can still be functional.

In order for this system to respond to requests and be Available(A) it should sacrifice on consistency (C) because if nodes are not able to talk to each other they cannot share data with each other. What you may write at node 1 would be available with node 2. So different nodes might return different results on reads. So system can be available(A)but cannot guarantee consistency(P).

If you want to attain consistency (C), then you cannot be Available (A) because you need network to communicate first.

Note*: It should be noted one has to choose between Consistency and Availability only when there is a network partition or failure happens. In absence of network partition or network failures, both availability and consistency can be satisfied.

Lets try to analyze few distributed systems.

When you look at distributed systems like ElasticSearch and you try to think where does it lie in CAP chart, it seems not so apparent . The system is supposed to highly available with multiple shards and replicas, multiple master nodes over the network and returns consistent data.

If you are familiar with ES topology there n number of master nodes and system is only available if (n/2)+1 nodes are able to talk to each other. This is called split brain problem.

So clearly the system tries to handle network partition to some extent but it goes to a (C)(P) state and gives up Availability(A) after a certain threshold of failures.

Again Why?

I think most of the system like Redis, ElasticSearch etc which follow master slave topology where a group of node vote to select the master, it is important to maintain the consensus else there is a good chance data might get corrupted.

So for example if a ES cluster is separated into two partitions and it continues to work , there might be a possibility that you can create different version of the same record into each separate cluster. When the network resolves it would be very difficult to derive what is the correct status of the record.

In fact I think most of the system can support read in case of partition failure, it is the write which leads system to an recoverable state and that’s why they become unavailable to safeguard the system.

Conclusion

I think for any distributed system it is very difficult to adhere to CAP theorem perfectly. They all try to attain all three attributes to an extent in case of failures but ultimately gives up one over the another after a certain threshold.

--

--