CAP Theorem Considered Harmful: Data integrity in NoSQL & Alternatives to CAP

Published in

Analytics Vidhya

8 min readNov 6, 2019

“CAP is best avoided”. So states the acclaimed “Designing Data-Intensive Applications”, a best seller on distributed systems. Why?

(all quotes are from this book, unless stated otherwise)

Let’s make a bet: I’ll give you $42, and then you choose which of these data systems you want to run.

System 1:

A client browser sends a request to a web server, who sends this request to a SQL database:

BEGIN TRANSACTION;
UPDATE bank_account SET balance = balance + 40.00 WHERE account_id = 'Me';
UPDATE bank_account SET balance = balance - 40.00 WHERE account_id = 'You';
COMMIT;

System 2:

A client browser sends a request to a web server, who sends a request to a highly available NoSQL cluster distributed globally. In this cluster, my account is stored on partition A (which happens to reside in Hawaii), and your account is stored on partition B (which happens to reside in Antarctica). Each partition separately (without coordination) runs code to update our balances.

If you went with system #2, maybe you think that because system #2 uses a highly available cluster, it cannot be consistent during an inevitable network partition, per the CAP theorem. You’ve heard people warn about NoSQL losing and corrupting data. You’ve heard people say, “If you’re performing a financial transaction, then you better be using an ACID transaction!” In other words, maybe you think system #2 does not guarantee exactly-once processing of our balance transfer. Maybe you’re hoping the transfer would be lost, and you’d get to keep all $42. Maybe you’re afraid the transfer would be corrupted by being processed multiple times, and you’ll end up owing me.

If you went with system #1, you might think that system #1 guarantees I get back only $40 of my $42, leaving you $2 richer. After all, the system uses an ACID transaction, where C stands for Consistent.

I want you to choose system #1 because I know the ACID transaction does not protect you from being charged multiple times. And of course, there are details in system #2 I’ve withheld that guarantee you are charged exactly-once, even though the system is fault-tolerant and its performance horizontally scales.

How can system #2 guarantee you are charged exactly-once? The CAP theorem proves that it is not a consistent system during a network partition, right? Actually, the CAP theorem does not apply here: “Many so-called ‘highly available’ (fault-tolerant) systems actually do not meet CAP’s idiosyncratic definitions of availability”, and this is one of them. In fact, you can have a system that is both strongly consistent and “available” during network partitions as long as a majority of nodes are responsive. We could make system #2 strongly consistent, but we did not. It does, however, guarantee strong integrity.

Hmm…if we can guarantee strong integrity without strong consistency, then what added benefit does strong consistency give us? It gives us strong timeliness.

Strong timeliness means users read up-to-date state.

Strong integrity means no lost or corrupted data.

These two are often conflated into “consistency”.

There is a common misunderstanding that only strong consistency guarantees strong integrity. This is false.

Understanding how system #2 does guarantee that you are charged exactly-once will be easier after understanding how system #1 does not guarantee that you are charged exactly-once.

So let’s first examine system #1’s flaws.

The problem with system #1 is that it is a distributed system (the web browser, web server, and database are networked) that guarantees strong integrity only at the database level, not at the system level.

What could go wrong?

Let’s say the web server sends the transaction request to the database, but the network partitions before the web server receives the database’s response. The web server does not know if the transaction was processed or not. If the web server reconnects and resends the transaction, the database may actually process two transactions. If instead the web server checks if a $40 transfer occurred before resending the transaction, it does not know whether or not that transfer was from its request. The problem is that we have a 1:1 mapping between a network connection and the transaction, causing the transaction to not be network partition tolerant. We could remove this 1:1 mapping by using a 2PC (2 Phase Commit), making the web server-database network partition tolerant, guaranteeing exactly-once processing at this level. But we still have the client browser.

The client browser could send the transaction request to the web server, but the network partitions before the client browser receives the web server’s response. The client browser does not know if the transaction was processed or not, so it tells me there was an error. I send another request. The browser warns me, “Are you sure you want to re-submit this form?” and I say yes, because I don’t know that my previous submission worked, and I want my money. To the web server, this is a new request, and to the database it’s a new transaction. (Yes, I wouldn’t be the one sending submitting this form in real life. If you must imagine a realistic scenario, you can imagine that you want to transfer rent money to your roommate.)

How can we guarantee exactly-once processing at the system level? We need an end-to-end operation id.

The client browser could include a UUID in a hidden form field, so that it’s sent with both the initial request and any retry request. The web server passes this UUID to the database, who guarantees that it will process each UUID exactly-once. Possibly by using this SQL instead of the SQL above:

CREATE TABLE requests (
    request_id uuid PRIMARY KEY,
    from_account varchar(40),
    to_account varchar(40),
    amount decimal(19, 4)
);// The above table was created before our requestBEGIN TRANSACTION;INSERT INTO requests
    (request_id, from_account, to_account, amount)
    VALUES('totally-a-valid-uuid', 'You', 'Me', 40.00);UPDATE accounts SET balance = balance + 40.00 WHERE account_id = 'Me';
UPDATE accounts SET balance = balanced - 40.00 WHERE account_id = 'You';COMMIT;

With these changes, system #1 now guarantees strong integrity. If we don’t need better performance or fault tolerance, this is probably a better solution than system #2. But it does not have a better integrity guarantee.

Now that we understand why the original system #1 did not guarantee integrity, and how to fix it, let’s look at how system #2 guarantees strong integrity.

Here is an outline of the processing steps:

An operation id is used as in the fixed system #1. The transfer request is sent from client browser to web server to database partition based on the operation id, creating a new transfer request record.
This new transfer request record leads to 2 new transfer records: 1 to my account, 1 to your account. All records include the operation id.
When either of our accounts are read, transfer records are deduped by operation id so they are processed exactly-once (a better term for exactly-once is effectively-once, but unfortunately exactly-once is more commonly used).

Note that any step can send messages multiple times, because at the end operation ids are deduped, leading to exactly-once processing. This makes the system highly fault-tolerant. Because everything is partitioned, the system can horizontally scale. No multi-node commits means higher availability and lower latency, even when our database cluster is globally distributed.

Given these benefits, why would we want strong timeliness (the only added benefit strong consistency gives us here)? Because it simplifies systems.

We want simple systems so we might say, “CAP says we pay for strong consistency by losing availability during network partitions, but we’re not Google so network partitions in our system are so rare and quick that we can pay for that”. That may be true, but CAP doesn’t warn that strong consistency costs higher latency even when there are no network partitions. Unfortunately, strongly consistent transactions “typically only work in a single datacenter” and they “limit the scale and fault-tolerance properties, especially in heterogeneous storage systems”. As multiple datacenters, scale, and heterogeneous storage systems become more and more common business requirements, we will need to guarantee strong integrity without strong consistency, even when our systems can be unavailable during network partitions.

Again, why is CAP best avoided?

“In discussions of CAP…the formalization as a theorem does not match its usual meaning. [Eg,] Many so-called “highly available” (fault-tolerant) systems actually do not meet CAP’s idiosyncratic definitions of availability. All in all, there is a lot of misunderstanding and confusion around CAP, and it does not help us understand systems better, so CAP is best avoided.”

The only thing the CAP theorem tells us is that a system has a choice of how to handle network partitions in a cluster: either stop processing writes, or risk out-of-date reads from nodes that are not in a majority. This is only one tradeoff of many, and not as important as most people think: If we have a 5-node cluster where node 1 cannot communicate with the rest, nodes 2–5 can continue processing reads/writes while maintaining strong consistency. Furthermore, “CAP has now been superseded by more precise results, so it is of mostly historical interest today”.

If CAP is best avoided, then why do people still use it (if incorrectly)? Because it was historically important, and because “CAP” used to be an informal term to start discussions on the tradeoffs among the various types of consistency and availability. Unfortunately, the CAP theorem changed the meaning of the term, and that confusion together with the overloaded terms “consistency” and “availability” lead to misunderstandings.

Because CAP is best avoided, what should we think about instead?

Instead of “consistency” or even “strong consistency”, think about “integrity” and “timeliness”: What data timeliness do we need? What data integrity do we need (the answer is surprisingly often not strong integrity)? If strong integrity, then system #1 shows we should be able to guarantee this at the system level, not just the database level.

Instead of the CAP theorem’s “total availability” and cluster network partitions, think about how performant, scalable, and fault tolerant the system needs to be. When people say “high availability”, they usually mean “tolerant to network partitions”, but we’re often more interested in faults other than network partitions, like nodes timing out or being turned off, especially in a cloud architecture.

Conclusion: As our business requirements more and more frequently make strong consistency too expensive, we must learn to make tradeoffs without it when designing systems. In these tradeoff discussions with ourselves and especially with others, CAP does not help, but rather harms.

PS:

Strong consistency is NOT a SQL -vs- NoSQL property: strong consistency is the default in some NoSQL databases, and any asynchronously replicated database (SQL or NoSQL) lacks strong consistency.

CAP Theorem Considered Harmful: Data integrity in NoSQL & Alternatives to CAP

Written by Zachary Ryan Smith