In 2018, a pseudonymous group calling itself Team Rocket introduced the Avalanche Consensus Protocol. This post describes the protocol with seemingly magic (but real) properties, puts it in context compared to other approaches, and describes the technical underpinnings that make it unique.
What is Consensus?
Consensus is the means by which a series of independent voters (often called “validators”) come to an agreement on a decision. At a glance, the process sounds straightforward: figure out if a piece of data is consistent with the rules of their system and whether the rest of the network agrees with their assessment.
Consensus protocols (like Avalanche) facilitate agreement between all of the validators, ensuring that the network has a synchronized view on the data. Robust consensus protocols work even if some validators in the network are faulty or malicious.
At the end of this process, all nodes will share the same data, referred to as “state”, that was agreed on in this decision-making process. If a conflicting transaction goes through on any node and it’s not in full agreement with the rest of the network, this is called a “safety violation.” This means that at least one machine is inconsistent with the other machines in the network. The purpose of consensus protocols is to minimize the chance safety violations on the network, preferably to the point where it’s incredibly unlikely.
Simple enough concept, right? Yet the topic of consensus protocols is one of the hardest areas in computer science.
History of Consensus
Over the 45-year history of distributed systems, there have been only three approaches to the consensus problem: Classical, Nakamoto, and Avalanche. Let’s discuss these three approaches, to illustrate what sets them apart from each other and determine their respective strengths and weaknesses.
Classical consensus protocols, such as Practical Byzantine Fault Tolerance (PBFT) and HotStuff, are based on all-to-all voting. This means that a validator needs to hear from a vast set of nodes that comprise the network to come to a decision. In addition, they are “probability of 1”, or P=1, protocols, which means that the network agrees on a decision with absolute certainty. Transactions finalize immediately after a node has received responses from the requisite fraction of the nodes that comprise the system.
Classical consensus has two major problems in open, Internet-like settings. The first problem is that these protocols are fragile: their correctness depends heavily on everyone in the system knowing and agreeing on the identities of the validators that comprise the system at any one point. As a result, any errors in maintaining membership in the system, or any difference in the views of the network can lead to safety violations. Further, an attacker only needs control of 33% of the network in order to launch a double-spend attack that is guaranteed to succeed.
Second, these systems, while fast, do not scale well in the number of participants. The most scalable Classical protocol, HotStuff (incidentally designed by Ted Yin, who is now working on the Avalanche protocol) used by Facebook’s Libra, only supports approximately 100 validators before the performance begins to suffer.
These two flaws make Classical consensus an unviable candidate for open and permissionless networks where nodes may join and leave at-will and nodes may behave adversarially to the network’s intent.
For over a decade, practical Byzantine fault-tolerant systems were a novelty because Classical consensus couldn’t scale to handle global needs. Then, out of nowhere, Satoshi Nakamoto came along and dropped the Bitcoin paper and showed the world it was possible to create an adversary-resistant consensus protocol that worked at a global scale. Nakamoto accomplished this by redefining the consensus problem and making the correctness definition probabilistic.
With Nakamoto-based protocols, instead of waiting for absolute certainty across all nodes in the network, Nakamoto trades off an indistinguishable difference in probability for greater scalability. Where classical protocols must reach consensus with a probability of 1 (P=1), Nakamoto reaches consensus with a probability of 1 minus some teeny tiny chance of error (P=1 — ε). In Nakamoto, this error value is made smaller and smaller over time as more blocks are produced. The more blocks, the chances of a reorganization drop exponentially.
Nakamoto is undeniably a breakthrough as a robust, global protocol, but it does have drawbacks. It’s slow, consumes a lot of energy, and takes a very long time to feel confident in a transaction’s finality. While those are acceptable downsides for an asset that moves infrequently or acts as a reserve, they are too much of a burden for use cases like peer-to-peer payments and decentralized finance.
In the wake of Nakamoto’s work, the world still wanted a protocol with all of the benefits of Nakamoto consensus (robustness, scale, decentralization) and all the benefits of Classical consensus (speed, quick finality, and energy efficiency). Crypto’s own version of the proverbial, “have your cake and eat it too.”
Then, in May of 2018, a paper was distributed by a pseudonymous group named Team Rocket that proposed a third class of consensus protocol they dubbed “Avalanche.” With all the benefits of Classical consensus and the massive decentralization of Nakamoto consensus, it proved that Classical protocols can be generalized to behave probabilistically and gain massive performance improvements as a result.
This article contains high-level explanations. If you want to read detailed information on Avalanche, please check out https://avalabs.org/whitepapers.
What is Avalanche Consensus?
Avalanche consensus, like Nakamoto consensus, is a probabilistic protocol. Just as Nakamoto traded off a small chance in probability for performance, Avalanche embraces probability to make the chance of error just as microscopic (and better yet, like all parts of Avalanche, configurable by validators on custom subnets).
Avalanche can make the error so small that it is even less likely that a safety violation will occur on an Avalanche node than the odds of finding a SHA-256 hash collision. To put this in greater perspective, it’s dozens of orders of magnitude more likely that a life-ending asteroid will collide with the Earth in the next hundred years than a SHA-256 collision is detected in the next thousand years by a network computing 1 quintillion hashes a second. That’s really safe!
Avalanche also finalizes all transactions immediately, without waiting for confirmations. Avalanche is able to do so because it’s a generalization over Classical consensus, so it inherits this very desirable property. In fact, Avalanche sees transactions finalize in less than a second on average. This is blazingly fast compared to existing decentralized networks.
As a generalized Classical consensus with probabilistic models factored in, Avalanche also has the property of being both CPU-bound and high-throughput. It doesn’t require specialized (read: expensive) hardware to reach high-performance numbers in excess of 4,500 transactions per second. That means the computer you already have (or even the one collecting dust in your closet) is powerful enough to run Avalanche. The combination makes Avalanche nodes both extremely green and economical.
Better yet, like Nakamoto consensus, there’s no known cap on how many individuals can participate in the network at its deepest levels — the same can’t be said for Classical protocols whose performance degrades exponentially with greater participation (more in the advanced note below). For all the talk about making these networks accessible to the masses, Avalanche is peerless in this regard.
Avalanche does not depend on Proof of Work like Nakamoto consensus. In protocols like Bitcoin, Proof of Work is necessary for both organization and security against dishonest actors. Avalanche could use Proof of Work but chose Proof of Stake to force users to have some amount of tokens before being allowed to vote on transactions.
Finally, unlike Bitcoin and other systems based on Nakamoto’s work that requires perpetual action, Avalanche nodes only work when there is work to be done. There’s no mining or polling to get new blocks. Transactions are broadcast to the wider network, who then hears them and begins voting. If there are no transactions to vote on, the nodes in the network don’t do anything except listen until new transactions are heard. Simply put, Avalanche works smarter, not harder.
Advanced note: The most important property of Avalanche, which really sets it far apart from existing Classical protocols, is that like Nakamoto consensus, it can operate with no known upper-bound of participants in the network. Decisions in Avalanche can be made in O(1) (constant number) of messages per node. Compare this with Classical protocols which take O(n²) messages to reach consensus and the network scaling issues that faced Classical go away in Avalanche.
How Avalanche Consensus Works
To start, let’s talk about what a validator does in Avalanche consensus. Avalanche is a voting protocol. Validators listen for transactions. When they hear a transaction, they vote on whether a transaction is to be “Accepted”, and if it is not, it is marked “Rejected”. Transactions that appear virtuous will be voted on as “Accepted” by a validator. If it is conflicting, the transaction will be voted on as “Rejected”. That’s the sum of it. A validator sees a decision, makes an initial decision, and then collaborates with the rest of the network to determine whether the network agrees with this decision. This is the same expectation in Classical voting protocols, but in Avalanche this happens on much higher validator counts.
This voting process in Avalanche is what makes it so unique. Each validator is a completely independent decision-maker. There is no leader election. However, by protocol, each node uses the same process to determine whether a decision is virtuous and how likely they are in consensus with the rest of the network. Once they see a high probability of agreement across the network, the node locks in their vote and accepts a transaction as final.
The process used to determine whether a transaction is preferred and that the rest of the network agrees on this decision is referred to as “repeated random subsampling”.
At a high-level, this means a validator randomly selects other validators to ask them what they like. It does this over and over again on new randomly selected nodes until it builds up enough data to determine that its probability of being correct is so high you may as well consider it impossible that it’s wrong.
In more detail, it works as follows:
- A validator is presented with a set of transactions that have been issued and asked to decide which transactions to “Accept.”
- The node client presents the virtual machines (“VMs”) with their transactions, and the VMs will provide information to help the client determine which transactions are acceptable.
- The validator selects a subset of these transactions that do not conflict, marks them as preferred, and attempts to accept them over the network.
- Any nodes that query this validator receives its latest preferred choice for this decision.
- This validator selects K nodes from the entire validator list (probability of selection is weighted by stake amount) to query for their preferred decision.
- Each queried validator responds with their preferred decisions, the validator’s votes are updated accordingly, and confidence is built.
- Meanwhile, other nodes are also selecting random validators from the validator set to be queried for their preferred response and updating their answers accordingly.
- This continues for at least M rounds or until a sufficient confidence threshold is reached, whatever comes last, with K validators selected randomly each round (without replacement).
- Once a confidence threshold is reached, the decision on the transaction is locked in as final.
- If “Accepted”, the transaction is sent to the VM to be handled; if “Rejected”, the transaction is removed from consensus.
The Team Rocket whitepaper showed that with correct parameterization, a network will come to the same decision using this process to a parameterizable probability.
There’s a need by scientists to have certainty in their models. It’s very nice and neat to be able to say, “this is absolutely true beyond a shadow of a doubt,” when describing a system or process. Classical protocols strive toward this idyllic model, but in the real world, nothing is certain.
With 100 machines in a Classical network, there’s a chance that 33% of them could all go offline at once or that someone socially takes over 33% of them and tries to impose their will on the network. There’s also a non-zero probability that all the oxygen in a room will jump to the other side through random collisions and cause anyone misfortunate enough to be on the wrong side to suffocate in a pool of nitrogen. It’s possible, but the probabilities are so low as to not be a concern for anyone.
Nakamoto continues to demonstrate to the world that probabilistic finality is acceptable, as long as the chance of a safety failure is so astronomically remote that if one occurs it would be more fascinating than the network having a little hiccup for a bit. The world finds that as an acceptable service level. Heck, that’s way better than five nines (99.999%) that you get with carrier-grade SLAs.
Avalanche, by accepting this same minor margin of error, is accepting the chance of error once in 20,000 years in the model with the right parameters. It’s more likely that aging internet infrastructure causes massive network outages. It was this key insight that helped Team Rocket to pave the way for a new consensus protocol which scales tremendously better than existing Classical networks.
A Scalable, Decentralized Future with Avalanche
Classical practical Byzantine fault-tolerant protocols showed that a network can come to consensus despite adversary participants. Nakamoto protocols showed that probabilistic protocols are safe in practice and can provide unprecedented decentralization and robustness.
Avalanche takes these revelations and combines them into a new protocol, showing that you can have the best of both Classical and Nakamoto without the downsides that come with either class of protocols.
By having a validator randomly select other validators in order to ask them what their preferences are, participants in Avalanche build confidence in the correct decision shared by all nodes in the network. With enough confidence, a decision is finalized immediately. This process happens so quickly that Avalanche rivals major payment systems in its ability process and clear transactions in a network.
Avalanche is an open-source platform for launching decentralized finance applications and enterprise blockchain deployments in one interoperable, highly scalable ecosystem. Developers who build on Avalanche can easily create powerful, reliable, and secure applications and custom blockchain networks with complex rulesets or build on existing private or public subnets.