What is CAP theorem? How is it applied to distributed databases?

Vivek Bansal
4 min readNov 3, 2018

--

Fact : You don’t want CP systems , don’t want AP systems and you can’t have CA systems.

CAP Theorem :

Choose any two among "Consistency, Availability and Partition Tolerance" and let go the third one.

Fact : You can’t design a distributed system which will not have partition in future . (it can occur due to network lag, high memory usage leading to box crashing etc … )

In other words, theorem says when there is network partition, you must choose between consistency and availability.

Consider the following system design in which there is a master-slave architecture. All read operations are done from slave and all write operations are done on master which asynchronously updates the slave.

Now, consider the case when master can’t connect to slave due to some network problem (partition). You have 2 options to choose from :
Option 1 : Availability over Consistency

Option 2 : Consistency over Availability

Choose accordingly from above situation which suits you best, there is no right/wrong design.

Meaning of Consistency and Availability in CAP :

1. CAP-Consistency : Extreme form of consistency(linearizable/Sequential consistency ) which states that even though multiple nodes are there in a system, and when an update comes, it should seem like that all nodes are updated at the same moment, therefore giving consistent data to the end users and thus called CAP-consistent.
You don’t require this much strong consistency in most real world distributed systems, what you want is Eventual Consistency.

2. CAP-Availability : Every request received by a non-failing node in the system must result in a response. Suppose 1 node goes down in a system out of total 4 nodes, the architecture is still able to respond and is called CAP-available.
You don’t require this form of availability. What you need is high availability or Less Latency, a architecture which results in a reasonable less response time.

Q. What would be a CP system i.e sacrifice availability over consistency?
A. CP systems are those that sacrifice availability only when there is network partition, otherwise they are CA.(Banking sector : You want to maintain consistency between master-slave replicas rather than being available to customer in the case of network partition).

Clearly, the CAP theorem doesn’t state anything about Eventual consistency, less latency and high availability which are the essence of building today’s distributed systems. So, it is of less use to us.

Q. Is there an Alternative tool to discuss distributed database?
Answer — Yes

PACELC Theorem :

If there is Network Partition :
Choose between Availability and Consistency
Else
Choose between Latency and Consistency

Consider the following diagram :
GREEN LINE : When there is no network partition .
RED LINE : When there is network partition .

Ultimate Goal : Make user experience rich and healthy when it comes to browsing website/app and doing payment.

What we actually desire in our distributed system is to find the sweet spot in different situations between when there is network partition and when there is not.
For instance, for web services like Flipkart, when there is no network partition, they would find a sweet spot between consistency(needed when sale is live and huge traffic hits their servers at the same moment) and latency(need to serve content in reasonable less response time). But, when there is network partition, they would definitely choose availability over consistency(instead of showing 500 on UI and making website/app unavailable to customer).

Similar to above, we can configure our databases to behave in a way that fits in our required use case and therefore lie in one of the four quadrants of PACELC.

Conclusion :

  • CAP is not the right tool to discuss today’s real world distributed systems because it discusses extreme form of consistency and availability and not eventual consistency and higher availability.
  • PACELC Theorem : Alternative to discuss distributed databases.
    Discusses about trade-offs between latency, availability and consistency when there is network partition or not.
  • No design is right/wrong. You have to choose from the spectrum what suits you best. Mostly, it’s a business team call on these situations.

Resources :
1. Problems with CAP
2. PACELC theorem discussion
3. A Critique of CAP Theorem & Original CAP theorem paper

Please feel free to share any comment regarding the topic.

--

--

Vivek Bansal

I write anything and everything around Software Engineering