Byzantine Fault Tolerance

How the problem of a few hypothetical generals turned into an important concept

Published in

FMFW.io

8 min readJul 20, 2021

Once you start reading about Blockchain and maybe even venture on to digesting a few Whitepapers of promising blockchain protocols, the chances are that sooner or later, you will stumble across the Acronym “BFT”, Byzantine Fault Tolerance.

You might think, great, yet another technical term I don’t understand. Or maybe you’re already knowledgeable about all things DLT (distributed ledger technology) and BFT. If the latter is the case, you might want to skip this one or read on to check for correctness. 😉

Byzantine Fault Tolerance is one fundamental property when it comes to creating a reliable distributed ledger. Most cryptocurrencies operate on top of blockchain technology (with the exemption of cryptocurrencies such as IOTA that run on a Directed Acyclic Graph).

When Satoshi Nakamoto envisioned Bitcoin, he/she/they (sorry Craig Wright, we still think the Bitcoin creator remains anonymous) came up with a distributed network to which all nodes have equal access and rights. Not only that, all nodes would have the possibility to communicate with each other directly. No one node would exercise more power in a truly decentralised network than the next creating a censorship-resistant, trustless platform.

Reaching consensus

The “problem” with decentralised networks is that unlike in networks that a more prominent company operates, there is no single authority to enforce rules or offer dispute resolution. Consensus, defined as a general agreement, has to be achieved differently. In first-generation blockchains, the consensus is implemented by Proof-of-Work algorithms. Proof-of-Work relies on all nodes racing to solve a mathematical problem (basically doing some work) to verify blocks. The first one to solve the equation is awarded a block reward.

Block rewards incentivise miners (the process of solving the problem is called mining, the machines doing that work, therefore miners) to provide computing power to the network continuously. When most nodes in the network agree that transactions in a block are valid, it’s added to the blockchain and remains there forever.

Byzantine Generals Problem

As long as most nodes always agree on the correctness of transactions happening, there aren’t any issues. However, whenever it comes to networks that transfer value, there is a significant risk of malicious actors trying to target it (google cryptocurrency hack). The more extensive the network, the higher the risk of some nodes simply failing due to technical issues.

A Byzantine fault-tolerant system can tolerate such failures and malicious actors as long as they make up less than 1/3 of all participants in the network.

Did you know that Byzantine comes from Byzantium, a colony founded by a man called Byzas long before anyone even thought about blockchain. It came to fame when in 330 A.D the Roman Emperor Constantine I chose it to build a “New Rome” — better known as Constantinople. From there, he spearheaded the expansion of the Byzantine Empire which survived up until 1453.

Byzantine Fault Tolerance is based on the idea of the Byzantine Generals Problem. When building an Empire, one cannot just peacefully advance and hope anyone in the way will surrender. Just like other Empires, the Byzantine Empire used Force to expand. And hundreds of years later, Byzantine Generals took the main stage in a thought experiment by Leslie Lamport, Robert Shostak, and Marshall Pease.

They thought about what Byzantine Generals would do if they were trying to conquer a new city. The generals would position at strategically important attack points around the city. They’d then have to communicate with each other because if just a few of them attacked simultaneously, chances are they’d be defeated, as the city can sound the alarm and send its defence.

Another, less glorious option would be to retreat. Retreating would have to happen in a coordinated fashion because you wouldn’t want the people in the city to take note and take a few generals with their respective armies hostage.

Obviously, back then, before Christ, there were no mobile phones nor other ways to communicate with each other simultaneously. And had you tried what the Native Americans did with smoke signals, you’ll be noticed before launching an attack, and all left is to run. 💨

The three scientists thought the realistic scenario would be for the general to have a messenger to send one at a time to communicate. This works perfectly, assuming that all generals are honest. Leslie Lamport, Robert Shostak, and Marshall Pease wondered how many dishonest participants this setup would tolerate.

This is the Byzantine Generals Problem: having to communicate just one message at a time with the risk of having a dishonest general ruin the agreement.

One Lieutenant is a traitor.

In this first scenario, we assume that the second lieutenant is a traitor who doesn’t want to help the rest achieve consensus. All participants have agreed only to make their decision on what to do after they’ve received and evaluated all messages. (One can only hope that the city wasn’t that big and the messenger had a fast horse. Else this process can easily take a day). The commander orders all lieutenants to attack. Lieutenant 1 and 3 are loyal and relay the message delivered by the Commander. Only the second lieutenant is a chicken and relays retreat and acts accordingly. However, despite that one disloyal member, 3 out of 4 ultimately agree to attack. They’ve reached a consensus.

The commander is a traitor.

Shockingly, even high-ranking commanders in Byzantine could be disloyal. In that case, the commander will order L1 and L3 to attack while he’d tell L2 to retreat. Luckily all the Lieutenants are loyal and will therefore all pass on exactly the order they received.

In the end, each lieutenant has received 2 orders to attack and only one to retreat. Based on that, they’ll attack — with or without the commander; they reached consensus again.

If there were 2 traitors and 2 loyal generals, there is no chance that they can reach an agreement.

Any system or algorithm that solves the Byzantine Generals Problem by ensuring that out of 3X total generals, at most, X can be traitors is known as Byzantine Fault Tolerant.

Byzantine Fault Tolerance and Blockchain

Byzantine Fault Tolerance is applicable not only in Blockchain but also in aeroplane engine systems, nuclear power plants and other systems that depend on many sensors and need a tolerance for a certain amount of sensors failing.

In Blockchain, the use of asymmetrical cryptography, digitally signing transactions and transmitting them to all nodes simultaneously plays a big role in creating a Byzantine Fault Tolerant system. While Blockchains might rely on different consensus algorithms (mainly Proof-of-Work and Proof-of-Stake), they all have some element of Byzantine Fault Tolerance. While a Proof-of-Work chain can’t be tampered with unless more than 51% of the computing power is in malicious actors hands, Proof-of-Stake chains might be more vulnerable if their respective token loses in value, making attacks more economical.

practical BFT

As attacks grow more and more sophisticated, just BFT might not always be sufficient in protecting a network. In 1999 Castro (not Fidel) and Liskow released a paper introducing practical BFT to solve problems associated with traditional BFT.

Unlike in Proof-of-Work scenarios where all nodes are equal, in pBFT, nodes are in sequential order, with one node being the primary node and the rest functioning as backup nodes. All nodes can communicate with each other to ensure that all honest nodes come to a consensus. Every consensus around consists of 4 phases:

The client sends a request to the leader node to perform a specific task
Leader node relays the task to the backup nodes
Backup nodes perform the task and send a reply to the client
Tasks are only completed if the client receives n +1 replies from different nodes reporting the same result. N is the maximum number of nodes that can be faulty.

Note that the leader in pBFT is changed during every view and can also be removed if they fail to broadcast for a pre-defined time. In emergencies, the majority of honest nodes can vote to replace a leading node with the next one in line.

Benefits of pBFT include:

Transaction Finality: unlike in PoW protocols where blocks need confirmation, transactions can be finally approved and finalized directly because all honest nodes agree on the system's state after communicating with each other.
Energy Efficiency: pBFT doesn’t need a high amount of computing power.

However, the system also has downsides. While pBFT works well for smaller groups of nodes, it can be a challenge scaling the network as all nodes have to communicate with each other leading to high overhead.

Additionally, pure pBFT models are vulnerable to Sybil attacks. During a Sybil attack, one party can manipulate the outcome by joining a big set of own nodes. The bigger the network, the smaller the threat. But as mentioned above, scaling pBFT is complicated.

Consequently, blockchain protocols working with pBFT deploy hybrid setups that combine pBFT with Proof-of-Work to reduce the risk of attacks. Zilliqa and Hyperledger Fabric are two examples that put pBFT into practical blockchain applications.

In conclusion, BFT is a property of any system that can reach consensus even when 1/3 of its members are acting maliciously. It was inspired by the Byzantine Generals’ problem that describes a group of generals trying to coordinate an attack on a city, one message at a time.

All blockchain networks inhibit some element of byzantine fault tolerance, making them more resilient. There is also practical BFT which provides high energy efficiency and transaction finality but is not scalable. Therefore, blockchain protocols such as Zilliqa use hybrid approaches combining pBFT with Proof-of-Work to scale.

Hopefully, you now know a little more about how some generals centuries ago conquering cities for the Byzantine Empire are still relevant for today’s distributed networks.