CAP Theorem and Blockchain
When you read about Blockchain, one of the first explanations might include something along the lines of a “distributed network of nodes.” But Blockchain isn’t the first such network. Distributed systems existed long before it. A distributed system is a network that stores data on more than one node in the traditional sense. The node can be hardware or a virtual machine.
Interestingly, distributed systems face one constraint. Out of the following three desired characteristics, they can only deliver two:
- Partition Tolerance
Sounds slightly familiar? If yes, then that’s because the CAP theorem is the basis for the famous Blockchain trilemma. Nevertheless, the CAP theorem also applies to the Blockchain, making it worth further coverage.
When we hear of distributed systems, we tap into the fallacy of believing that they will be reliable. However, it’s pretty common for networks (traditionally) to go down entirely or have parts go offline. Distributed systems are complex and have inevitable tradeoffs. The theorem describes the biggest one of them.
Professor Eric A. Brewer first mentioned it in 2000 during a talk on distributed computing. It’s therefore also called Brewer’s theorem and was scientifically proven in 2002 by Seth Gilbert and Nancy Lynch of MIT.
So what exactly do the characteristics mean?
- Consistency: all nodes in the system have the same data simultaneously, regardless of which node or peer they connect with. Any data newly added is immediately replicated and broadcast to the entirety of nodes in the system. Only then is it considered a successful entry.
- Availability: as one can guess, this describes that the network is always available. In other words, whenever someone requests data, they will receive a response from any of the working nodes.
- Partition Tolerance: is one way of talking about how many breakdowns in communication between nodes a network can deal with. A partition tolerant network continues functioning, even if the communication between individual nodes or an entire subset of nodes is delayed or entirely broken down.
How does this show in traditional distributed databases?
Broadly, we class databases into SQL and no-SQL databases.
SQL = Structured Query Language
SQL databases were developed in the 1970s and are best for relational data that fit into rigid table schemes. It focuses on reducing data duplication, and to scale a SQL database, one usually has to add additional hardware.
On the other hand, No-SQL databases were developed in the late 2000s and offer apps fast queries, support frequent app changes, and scale vertically. They also work in a more distributed manner by having nodes replicating logs from other nodes in the system to ensure that, if one node goes offline, others will be able to provide the data needed.
Non-blockchain databases are classified according to the theorem regarding which features they provide.
Note that in a distributed system, it’s inevitable that at some point, communication breaks down between nodes due to internet connectivity issues or other external factors. That’s why in practice, we will only find the following:
This database offers consistency and partition tolerance. But to achieve that, it sacrifices availability. If communication breaks down, users won’t be able to access the network (availability) until the problem is resolved and all nodes have synced again.
Availability and partition tolerance are upheld at the cost of consistency. Whenever a break in communication happens, the network continues working, but users might receive out-of-date or inaccurate data (consistency). As soon as all nodes are communicating again, they will re-sync and get back to a state of consistency.
In theory, one could imagine a distributed database that provides consistency and availability by sacrificing partition tolerance.
Developers looking for such a database can use a non-distributed database such as PostgreSQL and replicate it across nodes for backups if something goes wrong with their primary node.
What about Blockchain?
It might seem as if Blockchain violates the theorem at first glance. A blockchain is available, consistent, and partition tolerant up to a certain degree.
It’s established that Blockchains are fault-tolerant, meaning they can deal with the failure of nodes and even nodes acting maliciously. The number that they can withstand depends on their consensus algorithm. Generally, it’s assumed to attack the network; an attacker would need to control:
- Proof-of-Stake: 33% of the stake
- Proof-of-Work: 51% of the computing power
This leaves Blockchains with the question of prioritizing availability or consistency to fall in line with the CAP theorem.
If a blockchain chose consistency over availability, it would mean that it’d become unavailable to users whenever it had to resolve an issue with consistency.
Therefore, blockchains fall under AP databases. They prioritize availability and partition tolerance.
Food for thought: Even though, looking at some of the blockchains that struggle with outages, one might well be tempted to question…if they are actually a blockchain? That’s a question for yourself.
What about consistency?
Blockchains are not always consistent. Don’t panic just yet; your Bitcoin transactions are very likely all just fine. But by looking at how it’s achieved in blockchains, you can understand that they might not always be consistent or up-to-date.
Since there is no central authority establishing truth, nodes come to a consensus through algorithms such as Proof-of-Stake or Proof-of-Work. Once they agree that one block is valid, they add it to the chain. In the context of Blockchain, you can also think of consistency in tandem with finality.
Finality, as Vitalik Buterin put it, is always “probablistic.” And so is consistency in blockchains. They are eventually consistent; the deeper your transaction is in the chain, the less likely it is that there will ever be a change.
Yet there are a few events that can impact consistency in a blockchain.
Malicious block creation
If nodes in a blockchain follow the longest chain rule, and an attacker decides to create a second chain, split off the original chain, and manages to create a longer chain, technically, they could get all nodes to move to their chain, which might disrupt consistency. Most chains have specifications that will prevent that from happening. Minima, for example, uses the GHOST protocol to ensure that the majority of the network ignores such attacks.
Soft Forks are incremental updates to a blockchain that still allow nodes to use the previous rules. Put another way, they are backward compatible. Technically, such a fork could reject the validity of the previous protocol, but usually, it will accept the old protocol until the point of the fork. Therefore, it tends not to have any implications for consistency.
Nodes have to upgrade to the newest rules whenever a blockchain undergoes a Hard Fork, or the blocks they create will be rejected. This type of fork is not backward compatible and often leads to problems within ecosystems, as the split between Bitcoin and Bitcoin Cash demonstrated. It can impact consistency when the underlying rules of the protocol change dramatically.
Ethereum Hard Fork
One example is the Ethereum Hard Fork in 2016 that split Ethereum into Ethereum Classic and Ethereum.
After the DAO hack, during which attackers managed to steal $150 million worth of Ether, the community was torn between restoring funds and just continuing with the new status.
After weeks of heated debates, miners and node runners eventually voted to roll back the chain to the state before the funds were stolen. Those not on board with the decision continued running the previous Ethereum protocol, now known as Ethereum Classic.
While it’s noble to refund investors their money, it does mean that the Blockchain was not consistent.
Later it became known that the decision to hard fork was heavily driven by a few individuals in the Ethereum community. We believe that to be another vector of centralization when a few have the power to roll back the chain and make users question the consistency and integrity of an entire blockchain.
That is why Minima will be complete at launch, and no one will have more power than others in the network. Thanks to thousands, and by then a million node runners, the network will stay available and can efficiently deal with communication breakdowns between nodes.
Everyone will store their own transaction history and a hashed version of the entire ledger. After 2 minutes and 30 seconds, blocks are considered final, and you can rest assured that your transaction won’t undergo any changes.
You can find out more about finality here.