
Last week we covered the technical preliminaries for understanding consensus in depth. This week, we’ll give a high level overview of the PBFT (Practical Byzantine Fault Tolerant) consensus algorithm. While we consider newer protocols including PaLa a strict upgrade over PBFT, understanding PBFT is still crucial. Many of the terminology and ideas behind newer classical consensus protocols come from this protocol.
We use PBFT as a learning tool from which we will establish the language and ideas that PaLa and Thunderella protocols will build on. We do not cover other seminal classical consensus protocols such as 3 phase commit protocols and Paxos which employ similar ideas. Finally note that the consensus protocol presented here is slightly modified to be applicable for blockchain consensus.
Terminology
The algorithm is modeled as a state machine in a distributed system that is replicated across different replicas, among which one of the replicas is the primary and the rest are backups.
Normal-case operation
On a very high level, PBFT consists of the following steps in the normal-case operation:
- Request: the user sends transactions to the primary.
- Pre-prepare: the primary produces a proposal containing transactions and forwards to all replicas.
- Prepare: Upon receiving a proposal, backups will verify it, and if it succeeds, they will broadcast prepare message to all other replicas. Backups do nothing if verification fails. This is the first round of voting.
- Commit: Upon receiving prepare messages from ⅔ of all backups, replicas will now broadcast commit messages. This is the second round of voting.
- Reply: the client sees the result of consensus.
In most circumstances, normal-case operation requirements are met and the primary is able to rapidly process new transactions. Since only ⅔ of backups need to agree in the prepare and commit phase, PBFT can already tolerate up to ⅓ of backups going or even acting byzantine. But what happens if the primary goes down?
View-change
If the primary has failed, a subroutine called view-change will be carried out. On a high level, the view-change protocol provides liveness by allowing the system to designate a new primary when the former fails. When replicas see no new progress after some time, they broadcast a view-change message indicating that they want to advance to the next view. The view-change begins when view-change messages from ⅔ of all replicas are collected.
During a view-change, replicas may not agree on the latest state of consensus — remember that the primary was responsible for leading consensus up until now. Since we are in the partially synchronous network setting, we can not guarantee that all replicas will see a ⅔ majority vote in the first round if it exists. A replica who sees a ⅔ majority vote for a proposal in the first round can not assume that others have seen it as well and therefore there is no agreement.
Thankfully, PBFT required 2 rounds of voting where replicas do not vote in the second round until they’ve seen results from the first round. Thus, upon seeing ⅔ majority vote in the second round, the replica is guaranteed that at least ⅔ of replicas have seen results from the first round. This knowledge about knowledge replaces the synchrony assumption where nodes are simply assumed to have seen a message (a round of votes in this case) after a certain amount of time has passed.
With this mechanism in place, the new primary can continue from the last proposal that received ⅔ majority vote in the second round. View-change messages also include the last proposal a backup has seen ⅔ majority vote in the first round. Assuming fewer than ⅓ nodes are byzantine, one can show that if anyone has seen a ⅔ majority vote in the second round, there must be at least 1 honest backup who successfully broadcasted a view-change message indicating that they have seen ⅔ majority vote in the first round for that proposal. The proof is just an application of the pigeonhole principle however the reasoning is more complex and described in detail in 4.5.1 of the PBFT paper.
Takeaway
PBFT is a seminal classical consensus protocol employing 2 voting rounds for finality. It is in the partially synchronous network and achieves the optimal ⅓ fault tolerance. Understanding at a high level how 2 rounds of voting enable safe view-changes is the basis of all classical consensus protocols. The details of PBFT are not so important and we’ll see soon how PaLa is a strict improvement over the PBFT protocol.
PBFT is a message heavy algorithm. There are 3 rounds of messages and each message must be sent to all other replicas. It thus has O(n2) message overhead where n is the number of replicas.
Alternatives
PBFT employs the view-based approach to consensus where each view is run by a primary leader who is in charge of creating new proposals. The ballot-based approach is based on the Paxos consensus protocol and used in Federated Byzantine Agreement (FBA) protocols such as the Stellar Consensus Protocol (SCP). In the ballot based approach, there is no explicit leader. Instead anyone may share a proposal for a slot and, after consensus, the final value for that slot is created from all agreed upon proposals (e.g. the union of all transactions of all proposed blocks). This approach avoids censorship issues of having a single node responsible for creating new proposals in the short term. The ballot based approach still employs two rounds of voting to finalize a decision. As ThunderCore prioritizes the performance of view-based approaches, we do not go into details of the ballot based approach. For more information on FBA and ballot based approach, please see the SCP paper and the timeless Paxos historical document.

