Asynchronous Byzantine Fault Tolerance: A Time-independent & Future-proof Byzantine Fault Tolerance for the Brave New World

10 min readDec 29, 2019

Summary: This is a light-speed review on the history of distributed computation in order give the reader an intuitive understanding on the nature of asynchronous Byzantine Fault Tolerant (aBFT) consensus algorithms, how they came to be and why these protocols are the most suitable to be used as consensus mechanisms in permissionless blockchains than its ancestors, the classical Byzantine Fault Tolerant (BFT)consensus algorithm and its many variants. Among the upcoming projects that employ aBFT consensus algorithms is Fantom which with its Lachesis consensus algorithm (apart from its partnerships with government entities such as the Afghan Ministry of Health, the Korean Food Industry and Dubai’s smart cities) is a serious contender in the blockchain sphere from a technological point of view.

1. A Quick Review on the History of Centralized Computing: Breakthroughs & Problems

Since the initial conceptualization and development of the first mechanical computer in 1822, the Difference Engine, by Charles Babbage and Ada Lovelace passing through the creation of the first programmable computer, the Z, by Konrad Zuse in 1936–1938 and the Turing machine proposal by Alan Turing around the same time, the trend has seemingly been to centralize computing power through the centralization of computing hardware¹. An early example of this trend in the early digital age can be seen when the first mainframes like the IBM 700 Series² came into the scene which resulted in an ecosystem where computation was limited and only adopted by well-off corporations and governments who could buy and maintain this equipment which was used mainly for business purposes and scientific research.

The IBM 701 (“Defense Calculator”) was the first production computer developed by IBM and was designed primarily for scientific computing.

As the technology evolved, it became evident that centralized computing systems had their inherent problems resulting from its centralized nature. One of main problems was the possibility of complete failure due to accidents (e.g. fire or water). Another problem was the continuous insecurity the servers were susceptible to as results of hacks . Another important problem was the inefficient allocation of resources due to the localized nature of hardware equipment, resulting in scalability issues³.

2. The Emergence of Distributed Computing Systems: from PCs to the World Wide Web

These known problems of centralized computing settings gradually led to the creation of the first distributed computing systems like ARPANET⁴ in 1969 which consisted in a network of interconnected computers that can communicate for research purposes through the TCP/IP protocol which would later become the foundation of the internet (the biggest distributed system).

Additionally , the computer revolution in 1970 resulted in the creation and proliferation of personal computers (PCs) which gave us a glimpse of the power of distributed computing through single computers which were in itself a distributed system if we assume the central control unit, the memory units and the input-output channels were separate processes⁵. However, these personal computers didn’t’ interconnect to each other to form a distributed network, so no real distributed computing at scale was experienced at that time. Initial applications of the personal computer were video-games but without inter-connectivity among players/computers we all currently know about⁶. So even when the personal computer was initially useful, gradually all the problems and limitations resulting from its centralized nature became clear as the internet began to shows us little by little the future of distributed computing systems that laid ahead of us.

Altair 8800, the first personal computer.

3. Initial Fault Tolerant Systems…but Permissioned

The initial distributed computing systems were by the most part limited to highly centralized governance where malicious attacks were not possible as only permissioned access was allowed which normally was granted by an admin. For example, a corporation could claim fault tolerance by utilizing a distributed network of computers (even located in different geographies) but connected securely while full administrative control over the network was preserved by a central authority, thus avoiding any malicious attack. A well-known example of such as system was Google’s fault tolerant lock system called Chubby which had 5 nodes and tolerated up to 2 crashes⁷. In this scenario, adversarial attacks were not even considered as Chubby only allowed permissioned access, hence the fault tolerance referred to was only against crashes due to malfunctioning or damage in the nodes/computers. However, in real case scenarios, there is always the possibility of failure due to other factors such as nodes/computers acting maliciously (byzantine nodes).

4. The Byzantine General Problem: Byzantine Fault Tolerant (BFT) Consensus Algorithms

Therefore, systems that were resistant to malicious behavior such as the Byzantine Fault Tolerant⁸ (BFT) consensus algorithms were created. Just to mention, BFT systems were derived by Lamport as a general case of the “Two Generals Problem” ⁹(which has been proven unsolvable) when communication occurs through an unreliable link (e.g. the internet) known as the “Byzantine General Problem”. Mathematically, Lamport proved that BFT systems could only reach deterministic consensus only if less than 1/3 of all the nodes were malicious (byzantine). As a result, BFT systems and some of its variations such as the practical Byzantine Fault Tolerant (pBFT) algorithm which facilitates consensus (with 3% increase in latency) in a practical manner and other similar BFT algorithms proliferated through approximately 3 decades and have had many applications in diverse settings such as in aviation, space and nuclear power plants industries¹⁰. For example, in aeronautics, that correct functioning of an airplane depends on the correct transmission of information relayed by the many thousands of independent sensors, hence the system must be BFT to be able to continue its flying course even in the presence of many sensors failing to correctly communicate, thus avoiding systemic failure which could lead to an airplane crash¹¹.

Byzantine Fault Tolerant (BFT) Airplanes

5. Nakamoto’s Consensus: A New Solution to the Byzantine General Problem

Nakamoto’s type of consensus (as used in Bitcoin¹²) was the first solution to the Byzantine General’s problem that probabilistically guaranteed trustless consensus in a permissionless setting in the presence of less than 50% of byzantine nodes (a huge improvement over the less than 1/3 allowed by BFT systems and its variants at the time). This was really a breakthrough and worked following a 2-step process: first an anti-sybil mechanism called Proof of Work (PoW) was used in a computational competition to select a leader who would mine a block, and then in a second the other nodes in the network would validate the correctness of the computation by choosing the longest chain (the chain with more hash power or height) as the real chain, giving rise to the blockchain era. However, even when Nakamoto consensus was a significant breakthrough to the Byzantine General’s problem as applied to fully permissionless settings and with no central authority, there were some problems. One of them is that the PoW mechanism is not environmentally-friendly and makes inefficient energy consumption¹³. Another problem is the inability to scale which results from the fact that the ledger needs to be replicated among all the nodes to reach consensus, and this requires approximately 10 minutes in the Bitcoin blockchain, making it unsuitable for applications that require instant finality and high throughput. Other potential weakness is the synchronous nature of blockchains such as Bitcoin which can be brought to a halt by using Distributed Denial of Service (DDoS) attacks. Apart from this, cryptocurrencies such as Bitcoin are just one application on top of a blockchain, while smart contacts is another application that in itself allows the creation of a myriad number of distributed applications (dApps) in finance, insurance, Internet of Things (IoT), etc. Unfortunately, smart contracts are not scalable in linear and synchronous blockchains such as Bitcoin or Ethereum because in those blockchains every single node is needed to arrive at consensus which would render the network impractical for decentralized application that require high throughput and fast finality while being able to scale globally.

6. Alternatives to Nakamoto’s Consensus

Therefore, many researches have tried to implement more efficient solutions to the Byzantine General Problem than Nakamoto’s consensus, while still guaranteeing trustless consensus in a permissionless setting. Among the most prominent solutions include the combination of an anti-sibyl deterrent mechanisms (e.g. PoW, PoS, etc) with any of the many variations of Byzantine Fault Tolerant (BFT) consensus algorithms. Example of these variations of BFT-based consensus algorithms include: an optimized version of practical Byzantine Fault Tolerance (optimized pBFT) as used in Zilliqa¹⁴, delegated Byzantine Fault Tolerance (dBFT) as used in Neo blockchain¹⁵, and PoS BFT as implemented in EOS blockchain¹⁶. Nonetheless, all these protocols employ at their core a synchronous (or partially/weak synchronous) BFT consensus algorithm which makes them unsuitable for TRULY permissionless blockchain settings where “network links can be unreliable, network speeds change rapidly, and network delays may even be adversarially induced” ¹⁷. All these BFT variants might be prepared to tolerate malfunction faults due to adversarial collusion or damage in the nodes according to the classical definition of a byzantine attack but the internet has taught us that byzantine behavior can evolve beyond to what a legacy understanding of what is a byzantine attack could have predicted. For instance, Botnetsd¹⁸ and Distributed Denial of Service (DDoS) attacks can easily be used to flood with packets an important node involved heavily in consensus and thus delaying the transmission of packets said node must send/receive for the network to achieve consensus, leading to the halting of the entire network. In this scenario is evident that synchronous consensus algorithms are susceptible to time-delayed attack due to the fact that the transmission of messages needed to achieve consensus are time-bounded. Therefore, synchronous BFT consensus algorithms are not the strongest form BFT because time-assumption are made which makes provably impossible to guarantee liveness in adversarial settings where time-based attacks are constant as can certainly happen when we use the internet¹⁹.

Botnet-multi-server-topology by Mohammed Zahid

7. Enter the world of Asynchronous Byzantine Fault Tolerant Systems

Only BFT consensus protocols that are really asynchronous make no timing assumptions to achieve consensus and hence, guarantee liveness and robustness to thrive in the internet. There are not many aBFT consensus algorithms for permissionless blockchains but among the first ones is: the HoneyBadgerBFT consensus protocol that uses a novel atomic broadcast mechanism to achieve liveness wihtout making time assumptions²⁰. This was certainly an improvement over any previous BFT-consensus protocol that is applied to blockchains. As a seminal research is important to mention that the HoneyBadger proved the potential of asynchronous BFT consensus algorithms for upcoming cryptocurrencies that want to achieve both high throughput and liveness without compromising in decentralization, censorship-resistance and security²¹. In the same venue, it is critical to emphasize that the choice of architecture for the underlying blockchain, the type of application layer as well as the choice of middleware can limit or augment the potential benefits of any asynchronous BFT consensus algorithm.

8. Enter the Fantom: The First Scalable & Asynchronous BFT DAG-based Smart Contract Platform

A recent leap of innovation in asynchronous BFT consensus protocols occurred with the work done by Fantom, specifically through its Lachesis consensus protocol which is a Directed Acyclic Graph (DAG) based asynchronous non-deterministic protocol that guarantees a pBFT consensus by creating blocks asynchronously and where the OPERA chain (DAG) is used for faster consensus by confirming how many nodes share the blocks²². Basically, validation by the nodes in the network is not needed to arrive at consensus as occurs in typical BFT schemes. Fantom instead reaches probabilistic and asynchronously consensus by leveraging the concepts of distributed common knowledge, dominator relations in graph theory and broadcast based gossip to achieve a local view with high probability of being a global view, thus increasing throughput near linearly as nodes enter the network²³ while securing a probabilistically high enough certainty about the finality of the transactions. In a nutshell, Fantom achieves probabilistic consensus which mathematically ensures high throughput and where the state of the transactions is appended to the Opera underlying chain (not a linear chain or blockchain as in Bitcoin or Ethereum but a DAG chain which makes scalability at the protocol level possible!). This is a critical advantage that brings Fantom to different level if we consider the fact that Lachesis will unavoidably lead to probabilistic consensus where there is no need to wait for all the nodes to reach consensus(hence it will be fast) and once consensus is reached, then the state of the transactions will be stored in the always scalable DAG-based Opera. All these details in the structural parts of the Fantom architecture make it suitable for Smart Contacts with high throughput and fast finality (probabilistic but fast) and a prime choice (if not the prime choice) for dApps that will scale as the masses are onboarded into Fantom.

The Opera DAG constructed through the Lachesis protocol. Scalability at its finest!

References:

https://samsungnext.com/whats-next/a-brief-history-of-decentralized-computing/
https://www.ibm.com/ibm/history/ibm100/us/en/icons/ibm700series/transform/
https://itstillworks.com/disadvantages-centralized-network-scheme-12213044.html.
https://en.wikipedia.org/wiki/ARPANET
https://lamport.azurewebsites.net/pubs/time-clocks
https://medium.com/the-challenge/a-brief-history-of-decentralized-computing-d0d665783bcf
M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 335–350. USENIX Association, 2006.
https://people.eecs.berkeley.edu/~luca/cs174/byzantine.pdf
Akkoyunlu, E. A.; Ekanadham, K.; Huber, R. V. (1975). “Some constraints and tradeoffs in the design of network communications” (PDF) Some constraints and trade-offs in the design of network communications. Portal.acm.org. pp. 67–74. doi:10.1145/800213.806523.
https://captainaltcoin.com/what-is-practical-byzantine-fault-tolerance-pbft/
https://elaineou.com/2017/02/12/byzantine-fault-tolerant-airplanes/
https://bitcoin.org/bitcoin.pdf
https://www.vox.com/2019/6/18/18642645/bitcoin-energy-price-renewable-china
https://docs.zilliqa.com/whitepaper.pdf
https://docs.neo.org/developerguide/en/articles/consensus/consensus_algorithm.html
https://medium.com/eosio/dpos-bft-pipelined-byzantine-fault-tolerance-8a0634a270ba
https://eprint.iacr.org/2016/199.pdf (Part 1.1)
https://www.whiteops.com/blog/9-of-the-most-notable-botnets
https://www.youtube.com/watch?v=mm40Kkgzw-A.
https://eprint.iacr.org/2016/199.pdf (Abstract, Part 3 & 4)
https://eprint.iacr.org/2016/199.pdf (Part 6)
Sang-Min Choi, Jiho Park, Quan Nguyen, Kiyoung Jang, Hyunjoon Cheob, Yo-Sub Han, and Byung-Ik Ahn. Opera: Reasoning about continuous common knowledge in asynchronous distributed systems, 2018.
https://fantom.foundation/contents/data/2018files/10/TP_arXiv_v51.pdf