The Journey to Distributed Systems: Part 2— CAP theorem
Do you know Eric Brewer? If not anything, he is known for the Brewer’s theorem, also called the CAP theorem. He is an expert in distributed systems. If you want to know more about him, check Wikipedia. Ok, enough already.
CAP is a summary of Consistency, Availability and Partition Tolerance. They essentially describe 3 attributes of a distributed system. What the CAP theorem is implying is that in the presence of a network partition, for example, one has to choose between consistency and availability. You just can’t have it all!
Are you confused? Many are after just reading the summary above. Let’s try with a simple example to illustrate what it all means. Imagine a bank with 2 ATM machines and one user account. We would like the user to access his bank account, perform withdrawals and deposits on both ATMs. A simple approach is to have the user perform the tasks on one ATM. In turn, that ATM machine updates the other one. This is consistency as the account balances would be the same across both ATM machines. Also, the machines are available, since they are working correctly and the customer can use any of them anytime. But what happens if we have a network partition (connection is messed up) between the 2 ATMs? Oh-oh, is our money safe?. According to the CAP theorem, in such a scenario, you either choose between having the customer perform some transactions on one ATM and the updating of the other ATM is done when its available or you don’t perform anything at all until the network partition problem is resolved.
The reality of network systems is harsh — we can’t trust computers or network connections. Often times, we take these for granted. These are called the fallacies of distributed computing. In distributed systems, network failures or partitions have to be tolerated. Hence, in the presence of a network partitioning, the only choice left is between consistency and availability. But in the absence of network failure or partition, both availability and consistency can be satisfied.
Looks like the model is clear enough. Any examples? Consider the use case of databases as a distributed system. When more than one computer (or system) is used to perform database operations and transactions, the CAP theorem cannot be escaped. The RDBMS types prefer consistency over availability whereas the NoSQL (non-relational) families generally prefer the opposite approach i.e. availability over consistency. I hope at this point the concept of ACID and BASE makes sense. The C in ACID is consistency and the A in BASE implies availability. This explains the tradeoffs when running databases as a distributed system.
CAP theorem is simple right? Well, not really. We can talk of degrees of availability and consistency. In that sense, we could have a partial implementation of availability and consistency in a distributed system architecture. Consider the earlier ATM example. In the presence of a network partition, if we were to introduce an invariant constraint like no balance should be lower than zero, we could allow only deposits and no withdrawals in the working ATM. Or we could even go with the approach of rate-limiting so that customers do not get a too negative balance.
Summary
Whew! I hope this is enough information for you to understand the CAP theorem. Don’t forget that the CAP theorem formalizes the trade-off between consistency and availability in the presence of network partitions.