Distributed Databases Concepts

Learn more about distributed databases in these articles written with Dr. Patrick Valduriez, one of the world’s leading experts on distributed databases.

Understanding the CAP Theorem and its No Relationship to Scalability

--

What is the CAP theorem? It is actually a misnomer and a poorly understood result of distributed systems theory. Let’s start with the story. In 2000, Eric Brewer from UC Berkeley gave a keynote talk at the ACM Conference on Principles of Distributed Computing (PODC) where he presented the conjecture that out of three properties, namely Consistency, Availability and Partition tolerance (CAP), only two could be achieved in a distributed system subject to partitions [Brewer 2000]. Later, Seth Gilbert and Nancy Lynch from MIT, instantiated the conjecture, which was very broad and general, for a particular case — a replicated read-write register, and came up with a theorem and proof [Gilbert & Lynch 2002]. More recently, Eric Brewer wrote an article discussing the misunderstandings on the CAP theorem and explaining in depth the technical implications of CAP [Brewer 2012].

Figure 1: CAP Theorem publication history

The CAP theorem talks about the tradeoffs if one wishes to provide partition tolerance in a distributed system with data replication (or a replicated system). It has been used by many NoSQL database vendors (mainly key-value data stores and document data stores, see our blog post on SQL, NoSQL & NewSQL) as a justification for not providing transactional ACID consistency (see our blog post on Understanding the ACID properties of transactions and underlying principles), claiming that the CAP theorem “proves” that it is impossible to provide scalability and ACID consistency at the same time. However, a closer look at the CAP theorem and, in particular, the formalization by Gilbert & Lynch, reveals that the CAP theorem does not refer at all to scalability (there is no S in CAP!), but only availability (the A in CAP).

Figure 2: The CAP theorem is unrelated to ACID transactions scalability

What the CAP theorem actually states is that a replicated system that tolerates partitions can only deliver CA or CP. Thus, the claim that it is impossible to provide scalability and ACID consistency is just false. It is quite interesting that most practitioners accepted the claim as ground truth without actually reading the original paper.

Let’s analyze each of the three properties in CAP. Consistency © is an overloaded term that means too many different things. The term is used to define the coherence of data in the presence of different problems: concurrent accesses (which requires what is termed isolation in databases or linearizability in distributed systems or safety in concurrent programming), failures during updates of persisted data (which requires atomicity, i.e. that all updates of a transaction are applied to persisted data or none in the presence of failures), node failures in a replicated system (which requires replica consistency such as 1-copy serializability), breaking integrity constraints, etc. Without a rigorous and precise definition, talking about consistency is useless. In the CAP theorem, which deals with data replication (the only way to attain A, Availability), consistency actually refers to data consistency across replicas. However, there are different consistency criteria for replicated data.

Figure 3: Meaning of CAP

In his presentation [Brewer 2000], for which only slides are available, Brewer does not define consistency, but only discusses two main approaches to consistency: ACID (Atomicity, Consistency, Isolation, Durability) in the database community and BASE (Basically Available, Soft state, Eventual consistency) in the distributed system community. It should be noted that he is comparing apples with pears, since BASE considers replication and ACID by itself alone, does not.

Figure 4: Meaning of ACID

Let’s look at different notions of consistency for data replication. In ACID databases, 1-copy consistency [Özsu & Valduriez 2020] states that a replicated database should behave as a non-replicated database, i.e. replication is transparent and does not introduce unexpected results such as inconsistencies. 1-copy consistency has been applied to serializability, a consistency criterion for concurrent execution of transactions over data. 1-copy serializability is the actual consistency criterion for replicated databases based on locking. While 1-copy snapshot isolation [Yin et al., 2009] is the criterion for replicated databases based on multi-version concurrency control (MVCC) isolation criterion, called snapshot isolation.

Figure 5: 1-Copy consistency is about the equivalent behavior of a replicated ACID database with respect a non-replicated one

In distributed systems, the notions of consistency for concurrent execution are much more relaxed. One commonly used notion is linearizability, which states that the concurrent execution of methods over an object should be equivalent to a linear (sequential) sequence of invocations of these methods over the object. Linearizability is often used for replicated objects and we could use the term 1-copy linearizability if necessary.

Continue reading on LeanXcale blog.

--

--

Distributed Databases Concepts
Distributed Databases Concepts

Published in Distributed Databases Concepts

Learn more about distributed databases in these articles written with Dr. Patrick Valduriez, one of the world’s leading experts on distributed databases.

Prof. Ricardo Jimenez-Peris, PhD in CS
Prof. Ricardo Jimenez-Peris, PhD in CS

Written by Prof. Ricardo Jimenez-Peris, PhD in CS

Professor and scientist on distributed databases. Founder and CEO of LeanXcale

No responses yet