Implementation of SBFT : Scalable Byzantine Fault Tolerance Protocol

Yathesh Lekkalapudi
7 min readDec 11, 2021

--

By Aadarsh Venugopal, Bharath Kinnal, Yathesh Lekkalapudi and Sadaf Arshad
Link to Github repository

SBFT is a state-of-the-art Byzantine Fault Tolerance state machine replication protocol which can handle large scale replicas in a world-scale deployment. SBFT is an enhancement of traditional BFT protocols such as PBFT, with the inclusion of certain key design features.

One of the main features in the SBFT protocol is the use of collectors, which act as message aggregators for the network of replicas. Instead of all-to-all communication between replicas (as seen in PBFT), messages are sent from replicas to these collectors, which are then collectively verified and broadcasted to the respective replicas. An improvement of SBFT over traditional BFT protocols, such as PBFT, is that it achieves linear communication complexity through the use of these collector replicas.

Another feature of the SBFT protocol is the consideration of crashed nodes. Considering that there are f Byzantine nodes and c crashed nodes, the SBFT protocol will work when there is at least n=3f+2c+1 replicas in the network. This makes the SBFT protocol more resilient when compared to traditional BFT algorithms such as PBFT, where they don’t consider the scenario of crashed nodes in the network.

Another improvement introduced by SBFT is that unlike certain protocols such as Zyzzyva, it reduces the number of messages that a client needs to receive and process from the replica network. While Zyzzyva needs a client to process f+1 messages, SBFT requires the client to process only one message when a transaction is executed.

In this project, we have implemented the SBFT protocol as described in the following paper¹, and a live demonstration of our implementation can be seen in this video below:

Our Implementation

In our implementation of the SBFT protocol, we assumed the model to consist of multiple clients, as well as multiple replicas (or nodes). Each replica will also contain a persistent storage unit (the blockchain) in order to store the transactions.

SBFT assumes that the number of Byzantine nodes is f, and the crash tolerance of the network will be c, which means that there will have to be a total of n = 3f+c+1 nodes in order for the protocol to run effectively. For the agreement protocol, we will be considering 1 node for the role of primary, and c+1 nodes each for roles of C-collectors and E-collectors respectively.

Our implementation of the protocol will consist of two parts — agreement protocol, and view-change protocol. In the agreement protocol, a client sends a transaction request message to the replicas, which then collectively validate the request and execute the transaction. In the view-change protocol, the replicas collectively decide on the assignment of a new primary, in case the current primary is faulty. With this implementation of the SBFT protocol, we aim to achieve linear communication complexity and high scalability.

Modules in SBFT

Message Queue

Each replica will store a message queue which consists of messages that have been sent either by the client, primary, or the collectors, relating to a transaction request during the agreement protocol. Once a request message has been processed, the replica then signs the message and sends it to the respective collector (depending on the phase of the protocol).
There can also be a message sent by any non-primary replica relating to a view change request. If f+1 view-change requests are received by the replica, then the view-change protocol is triggered. Here, the request is processed and sent to all other replicas.

Replica State Machine

As the two parts of the SBFT protocol — agreement and view-change — execute such that they are mutually exclusive events, each replica has to store modes indicating which protocol should be implemented, which is one of the following :

  • Agreement protocol (A)
    The replica has to process the transaction requests present in the message queue.
  • View-change protocol (VC)
    The replica changes to this mode once it receives a message from another replica to trigger the view-change protocol. Once the replica is in the VC mode, it executes the view-change protocol. The replica then reverts back to the executing the agreement protocol after a new primary is assigned.

Blockchain

The replica stores the blockchain containing the set of executed transactions. The blockchain will be implemented as a linked list, where each element in the list will be a block containing the following :

  • Transactions in current block
    In our implementation, we will be considering one transaction per block.
  • Header of current block
    This will be calculated as the hash of the contents of the previous block, which includes the block header and transaction.

Transaction State Machine

The agreement protocol in SBFT works in 3 phases : the pre-prepare phase, the commit phase, and the execute phase. The working of the agreement protocol is illustrated in Figure 1.

Figure 1 : The Different Phases of the Agreement Protocol

In our implementation of the SBFT protocol, when a replica reads a message from the message queue, it will store the state of this particular transaction as one of the following :

  • Pre-prepared State (PP)
    A replica checks the validity of a pre-prepare message from primary. If it is valid, it then computes a cryptographic hash h from the pre-prepare message, signs it with a verifiable threshold signature σ(h) to create a sign-share block, and propagates the message to the C-collectors. The transaction will then be stored in the replica memory in the pre-prepared (PP) state.
  • Committed State (C)
    A replica checks the validity of the commit message from the C-collectors. If it is valid, it then computes a cryptographic hash h from the commit message, signs it with a verifiable threshold signature π(h) to create a sign-state block, and propagates the message to the C-collectors. The transaction will then be stored in the replica memory in the committed(C)state.
  • Executed State(E)
    A replica checks the validity of the commit message from the E-collectors. If it is valid, it executes the transaction and in the replica memory in the executed (E) state. It then stores the executed transaction in the current head of the blockchain, and finalizes the block by generating a new block header from the current block header and the current transaction.

Primary and collector logic

A replica in the network can be voted as a primary during the view-change protocol. Furthermore, some of the non-primary replicas are set as C-collectors and E-collectors in a round robin fashion. Since each replica can be primary or a collector, all replicas should store the functionality of the primary, the C-collectors, and E-collectors.

In our implementation of the SBFT protocol, the roles of the primary and the collectors are stored in a shared network configuration file, that is accessible by all replicas.

  • Primary
    The replica stores a state variable signifying whether it is a primary or not. If it is a primary, then the replica has to store an additional message queue for all client requests. When a client request is processed, then the primary should broadcast the message to all other replicas.
  • C-Collector
    The replica stores a state variable signifying whether it is a C-collector or not. If it is a C-collector, then the replica has to store an additional message queue containing the sign-share blocks sent by the replicas. In our implementation of the protocol, if at least 2f+c+1 sign-share messages are received, then the C-collector combines all the messages, signs it with a threshold signature σ to create a commit message, and broadcasts the signed message to all replicas.
  • E-Collector
    The replica stores a state variable signifying whether it is an E-collector or not. If it is an E-collector, then the replica has to store an additional message queue containing the sign-state blocks sent by the replicas. If f+1 sign-share messages are received, then the E-collector combines all the messages, signs it with a threshold signature π to create an execute message, and broadcasts the signed message to all replicas.

View-change logic

If there are f+1 votes, or if there is a publicly verifiable proof that a primary is faulty, then a replica will trigger the view-change protocol. The working of the view-change protocol is illustrated in Figure 2.

Figure 2 : The Different Phases of the View-Change Protocol

The different phases of the view-change protocol are as follows:

  • View Change trigger
    When a primary is found to be faulty, a non-faulty replica sends a ‘view-change-trigger’ message to other replicas in order to request them to elect a new primary candidate.
  • Leader election phase
    Each replica votes for a new leader and sends their vote to all other replicas. Each replica then tallies the votes received, and the new primary candidate will be the replica which has the most number of votes. It is important to note that all replicas may not vote for the same candidate, and that Byzantine nodes may vote for a different candidate. However, this will not impact the protocol since it is guaranteed that at least 2f+2c+1 honest nodes pick the same primary candidate.
  • View-Change Phase
    Each replica will send a ‘view-change’ request to their new primary candidate to update the view of the network. Updating the view of the network includes changing the view number, and making the elected candidate to be the new primary of the network.
  • New-View Phase
    Once the primary candidate receives a view-change message from 2f+2c+1 replicas, it initiates a new view by broadcasting a ‘new-view’ message to all replicas.

[1]: SBFT: a Scalable and Decentralized Trust Infrastructure: Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael K. Reiter, Dragos-Adrian Seredinschi, Orr Tamir, Alin Tomescu.
https://arxiv.org/abs/1804.01626

--

--