CAP Theorem Explained: Distributed Systems Series

Published in

Distributed Systems Series

5 min readSep 18, 2023

Hi! In this article I’m going to be talking about a popular theorem in distributed systems called CAP theorem. CAP theorem introduces the characteristics and trade offs many distributed systems need to deal with in unreliable networks, and we’ll look into some simple examples of how these trade offs occur.

Before diving in, what is a distributed system? At a high level, a distributed system can be defined as a network of computers that work together to provide services or solve problems. These computers in the network are able to communicate with each other in order to execute tasks and applications.

Now onto CAP Theorem! CAP is an acronym that stands for

Consistency
Availability
Partition-tolerance

The theorem states: “ In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request.”[ Gilbert and Lynch ]

This essentially means in an unreliable network your system cannot simultaneously guarantee consistency, availability, and partition-tolerance.

But why is that? Let’s explore this idea in a more familiar setting: reserving movie theater tickets.

Let’s say you want to book tickets to go see the newest movie that’s out in theaters. You go onto the theater website, look for a showing, reserve your seats, and buy the tickets. You then head over to the theater, and receive your ticket with seat number. But as you walk into the theater you find someone else is sitting in your seat. You ask them what their seat number is, they show you their ticket, and you find that their seat number is the same as yours!

So what happened? How did you both buy tickets for the same seat?

To help answer these questions let’s first breakdown what each term in CAP means, and what the varying systems can look like.

Consistency means that each node/server in your network will return the correct and most up to date response for a request. All nodes in the system will have the same data. In the diagrams, consistency is when all nodes share the same value for n, either n=0 or n=1.

Availability means if a request is made to the system a response will be returned. In the diagrams, a red node signifies that the system is currently unavailable for any requests.

Partition-tolerance means that in the event that communication between computers is down or lost the system is still operating. Naturally, this describes that the network is unreliable and subject to issues or failures. This is visualized with the connection between 2 nodes being dropped.

Based on the theorem we cannot maintain a system that fully satisfies all 3 properties(CAP) but we can come close or achieve a best-effort. What if instead we created a system that exercised 2 out of the 3 properties? CA, CP, or AP.

What would a CA system be? In a CA system we are maintaining both Consistency and Availability. This means we will have consistent data across our servers in a network and all servers will be available to respond to requests.

However, this is difficult to achieve in practice as you are assuming that your network is not subject to any crashes, data loss, or network failures. Now lets assume that your system is facing network partitions and we will need to prioritize either a fully consistent system or fully available system.

CP describes a system that is both consistent and partition-tolerant. In the event of a network partition, meaning two nodes lose communication with each other the system will become unavailable for further requests until the issue is resolved. Then once the partition is fixed the data is resynced and consistency is maintained.

AP describes a system that is both fully available and partition-tolerant. In the event of a network partition the system is available for reads and writes but what can happen is you will receive inconsistent or out-of-date data in the response.

In practice, you will find many networks or databases that will prioritize some characteristics over others, but will not fully sacrifice it as shown in these 3 scenarios. In this case many systems will exhibit all 3 properties(consistency, availability, partition-tolerance), but not fully guarantee it.

Now that we’ve defined each term and looked at each system, let’s revisit our movie theater seat example, and try to figure out what might have happened and why we lost our seat.

Here is some info about the diagrams being shown

n=1 means the seatA1 is available , n=0 means the seat A1 is unavailable
1 primary data node(available for writes), 2 secondary data nodes(available for reads)
2 client nodes

Diagram showing workflow for how both clients bought the same seat ticket, starting at top going left to right

In this scenario, we first see that the client application showed a ticket was available and were able to make a purchase. It then writes to the primary node that the ticket is no longer available. However, while this is happening the connection between the primary node and secondary node 2 goes down, creating a partition. Because of this the data is no longer synchronized between primary node and secondary node 2, losing consistency. As a result, when the second buyer goes to purchase a ticket from their end, the system thinks a ticket is still available because that is what the secondary data node 2 returns as a response. And it will then write to the primary node again that the ticket is unavailable. The partition is then fixed but by then the transaction has already completed.

This is a simple example of what may have happened during the transaction process to show how both buyers purchased the same movie seat ticket. The system is illustrative of an AP system that was maintaining both Availability and Partition-tolerance as a priority through the process.

Again not all systems will fully sacrifice one property for the others and there are many examples of distributed networks, databases, and other systems that weigh these trade-offs based on their own needs and requirements in order to maintain something that is both reliable and efficient. To learn more about cap theorem checkout Gilbert and Lynch’s paper where they provide a more in-depth discussion.

Thanks for reading!

CAP Theorem Explained: Distributed Systems Series

Written by Lohith Chittineni