Blockchain Consensus Algorithms: What are their Properties?

Zak Ayesh
10 min readMay 3, 2019

--

Photo by Cytonn Photography on Unsplash

One of the biggest innovations of Bitcoin was the use of the, now well known, Proof-of-Work (PoW) consensus algorithm. A consensus algorithm is what allows different nodes in a distributed network to come to an agreement on the state of the system (blockchain in this case) despite not knowing or trusting each other. Since the advent of Bitcoin and PoW, there has been an explosion of different consensus algorithms, each with their own trade-offs.

I plan on creating a series of articles going into detail on some of the biggest blockchains and their various tradeoffs including their consensus algorithms. Before that however, I would like to cover some foundations. In this article I will cover a quick overview of the theory of consensus algorithms and their properties. Consensus algorithms are not the only important component of blockchains, so do not think understanding them is the only thing you need to make informed decisions on blockchains, they are simply one piece of the puzzle.

Distributed and Decentralized Computing

Consensus algorithms are a topic in the field of distributed computing. This is a field that studies computers and programs that must communicate to achieve a goal. The systems can be separated but still in close proximity to each other (i.e. the many computers in an airplane). However, the vernacular is that distributed systems refer to components that are separated by a large distance. One example of this type of distributed computing system is a massively multiplayer online video game (MMO). In a MMO there are various servers, usually physically separated, that must: work together to balance the game traffic, store the game state, and connect players together. We also have the computers the players utilize to actually play the game, which could be located over the entire planet. All of these computers must communicate to create the experience the designers wanted for the players.

Almost all computer systems involve some form of distributed computing; after all the internet itself is simply computers that are geographically distributed and communicating with each other to share information. Thus blockchains are definitely considered distributed computing systems. However a key feature in public blockchain infrastructure is that they are not just physically distributed; they tend to be politically decentralized (in various degrees which I will discuss later). Referring back to the previous MMO example, a game has several distributed servers communicating with each other and many user’s computers connecting and disconnecting from these servers. However the game itself, and all of the data relating to it, is stored in the servers controlled by the company who owns the game. It is politically centralized.

This results in the state of the game and all of its properties in control of the game makers and not the players. The users collectively put trust in the game owner/owners (a middleman) to maintain the game, handle their personal data, and moderate when there are problems between players. The big value proposition of Bitcoin was that it removed that very middleman, so users could transact with each other without relying on a trusted intermediary. However, in truth, we are replacing the middleman with tens of thousands of independent “middlemen”, all competing with each other and following the same algorithmic process to come to the same truth. We call these “middlemen” nodes in the network. Depending on the circumstances of these nodes, they may also be called “full nodes”, “miners”, or “validators”. What are the algorithmic processes I was referring to? Consensus algorithms.

Consensus Algorithm Properties

Here I go into the important properties consensus algorithms have:

1. Fault Tolerance and Economic Security

Security in a consensus algorithm typically refers to how fault tolerant it is. Blockchain security is actually a composite of many different properties but that’s an article for another day. Fault tolerance is a property of a system that is able to handle some amount of its components malfunctioning. Referring back to my MMO example, the servers are usually designed with fault tolerance in mind. If one server goes down, traffic is re-routed to other servers and the entire game doesn’t shut down. We often store our data in a fault tolerant way by backing up our data in multiple locations (internal hard drive, external hard drive, flash drive, cloud, etc.) to assure that one device failing doesn’t cause permanent loss to our files. There are many types of faults that could cause a system to malfunction, such as a software bug like a stack overflow or an important hardware component failing due to fatigue. If you’ve been in the blockchain space for any amount of time, you’ve surely heard the term Byzantine Fault Tolerance (BFT).

A byzantine fault is one that is unique to a distributed system which is attempting to reach consensus between its nodes. If a node/component in a distributed system is producing: false information, inaccurate information, or even no information at all, it can be considered a byzantine fault. Essentially if a node exhibits any behavior that is deviating from the desired behavior, it is considered a byzantine fault. As you see, this type of fault covers a huge amount of possible failures thus making them extremely difficult to contend with. Also, how do we know if the information is bad if we haven’t achieved consensus? This is what makes designing a BFT algorithm non-trivial. BFT derives its name from the famous “Byzantine Generals Problem”.

A more traditional BFT algorithm requires that greater than two thirds of the nodes in the system be “non-byzantine” or acting desirably (properly, honestly, and accurately). Examples of these are: Practical Byzantine Fault Tolerance (pBFT), Tendermint Consensus, minBFT and more. Not all require this though; Bitcoin and Ethereum’s consensus algorithms require 50% of the nodes be non-byzantine to maintain security. With proper design considerations and assumptions, we can even create an algorithm that is 99% fault tolerant, as discussed in Vitalik’s excellent blog post (the question then becomes what are the trade-offs we made to achieve that level of fault tolerance).

As a quick note, economic security is how expensive is it for a bad actor to create a fault in the system. Examples of this could be: how much money would it take to bribe 1/3 of the EOS validators to act in your favor, or how much money would it take to buy 51% of the hash rate on Bitcoin/Ethereum? The more expensive, for the same possible reward an attacker may receive, the greater the economic security and thus less probability of nodes acting byzantine. It could be argued this is actually a macro property of blockchains themselves, and not just the consensus algorithm, however I feel as if it’s coupled enough to consensus algorithms to be considered one of their properties.

2. Consensus Node Permissioning (The Number of Decision Making Nodes and The Degree of Political Centralization)

Who is allowed to become a node and participate in the consensus algorithm?One node is completely centralized, twenty one nodes slightly less so, and hundreds of thousands would be considered decently decentralized by most. All else in a blockchain equal, if there are more consensus nodes in a network the less the transactions per second (tps) throughput is due to the increased amount of messages that need to be sent. Lets look at a quick example using pBFT (again I don’t desire to go into many specifics in this article so we will keep it brief) :

Assuming a minimum of amount of messages,m, of:

Where f is the number of faulty nodes. This simplifies to:

We then equate f to the total amount of nodes, n, in the network:

Rounded down to the nearest integer. The minimum amount of nodes must be less than 1/3rd to achieve BFT.

Then:

Using Big-O notation we can say the algorithm scales at:

where n is number of nodes.

As you can see, if you increase the number of nodes by a factor of 10, then the number of messages increases by a factor of 100! Thus some blockchains choose to permission their consensus nodes, and keep the number small in order to increase tps capability (Ripple) while others choose permissionless nodes in order to prevent cartels and censorship (Bitcoin and Ethereum). Others, in order to reduce the power of permissioned nodes, allow non-consensus contributing nodes to vote who they would like to contribute to consensus (EOS). I will discuss my personal opinion on these trade-offs in a future article.

3. Finality (Latency for Users)

Finality is a property that describes the irreversibly of a consensus decision. How can we trust a decision if it constantly changes? If a customer is at a store and pays with a VISA card, once the transaction is approved (which takes about 3 seconds), you know the sale occurred and that fact will not change (barring the customer filing a complaint to VISA later on). If vendors knew that VISA had a tendency to approve a transaction and then five minutes later change that decision; no vendor would accept VISA for fear of selling something without receiving payment. Some algorithms guarantee finality, while others probabilistically guarantee finality. This results in a trade-off on how fast a transaction can be considered final, and appears as latency to the users of the network. Of course we can design an algorithm that doesn’t ever reach finality (that doesn’t sound very useful though). Overall finality is an extremely important property of a consensus algorithm.

4. Synchronicity

Synchronicity describes the timing of how consensus nodes actually reach consensus. As an example lets imagine we are a node, out of a total of 4 nodes, connected to each other. The nodes are connected in such a way that we know that when messages are broadcast to each other, they will take no longer than 10 seconds to be received. If we then send out a message to the other 4 nodes simultaneously, we know that after 10 seconds all 4 nodes will have received the message. We can then give them 10 seconds to process/think about the message and then send out another one. In other words the network works in defined and discrete chunks of time: T=message sent, T+10 = message received and processing, T+20 = new message sent, etc.

The above example is called a fully synchronous timing model. This is an idealistic model in most scenarios; it is almost impossible to guarantee messages will be received by a specific time in every scenario. What if the physical wires connecting the computers get re-routed causing an increase to the delay? Or what if the nodes are connected over the internet, and the internet for a node slows down, or even completely goes offline leaving the node disconnected? Thus this is not a very realistic timing model outside of very small, meticulously designed, and highly controlled scenarios. For my thermodynamics fans out there this is very similar to the “carnot engine”; it would be convenient if our designs worked as close to this engine as possible, but physics dictates it is impossible to achieve (although we can get close).

There are several timing models ranging from fully synchronous to asynchronous. Depending on what model we pick for our algorithm, we can make different trade-offs (to be discussed in future articles). Almost all public blockchains would fall under the asynchronous assumption. You can see a chart I re-created from this medium article (which is an excellent read for a little more detail).

Based on a chart from: https://medium.com/mechanism-labs/synchrony-and-timing-assumptions-in-consensus-algorithms-used-in-proof-of-stake-blockchains-5356fb253459

5. Safety and Liveness

Safety and liveness actually make a trade-off trifecta, along with fault tolerance, in an asynchronous network. You can only have two of these properties. Almost all blockchains need some amount of fault tolerance so it usually comes to favoring either safety or liveness. This is proven in the famous FLP impossibility. Safety, in terms of a blockchain consensus algorithm, is essentially resistance to a fork occurring. Liveness is the ability of an algorithm to continue the consensus process no matter the disturbance. A safety favoring algorithm would never cause a fork in the network in protocol, but consensus may halt (Tendermint consensus can be used as an example). A liveness favoring network will continue running even if the occasional fork occurs in protocol (Ethereum and Bitcoin PoW as an example).

6. Energy Consumption

This wasn’t an important property of consensus algorithms until Bitcoin and PoW became popular. Integral to PoW was the use of energy intensive calculations, and as the number of consensus nodes grew exponentially so too did the energy consumption. Many consensus protocols that came after Bitcoin PoW specifically sought to solve this problem, and thus energy consumption has become an important consideration for blockchain consensus design.

Conclusion

There are many trade offs that can be made when designing just the consensus protocol in a blockchain. Fault tolerance, economic security, node permissioning, finality, synchronicity, safety, liveness, and energy consumption are some of the most important properties in a blockchain consensus algorithm. In future articles I will go into more detail into specific blockchains, their corresponding consensus algorithms, and what decisions the developers made with these various properties. Understanding these properties, and their trade off spaces, can help designers design the best blockchain for their users, and also help curious users make informed blockchain decisions.

Resources

http://allquantor.at/blockchainbib/pdf/vukolic2015quest.pdf

--

--

Zak Ayesh

Engineer, Chainlink community advocate and blockchain enthusiast. Lover of traveling, piano, and doggos.