NEO’s dBFT 2.0 — Single Block Finality with Improved Availability
NEO is innovative due to its features that target its vision of a digital smart economy. These features are primarily fast single block finality and support for easily programmable contracts. To achieve single block finality NEO employs a protocol it describes as dBFT (Delegated Byzantine Fault Tolerance), an adaption of pBFT (Practical Byzantine Fault Tolerance). Arguably the most challenging aspects of successfully implementing a BFT consensus protocol relate to dealing with the edge case failures that can occur due to network latency and node restarts.
For a blockchain aimed at facilitating financial transactions suitable for everyday business use (the smart economy), network availability is paramount. In this article I will describe the operational issues that were encountered with dBFT 1.0 and discuss how these issues are eliminated with the improvements made in the dBFT 2.0 consensus protocol recently deployed to the NEO MainNet.
In order to be able to understand the benefits of dBFT 2.0 it is necessary to understand some details regarding the operation of the consensus protocol. BFT algorithms such as Neo’s dBFT require approximately 2/3 of the validator nodes’ signatures in order to come to consensus. For dBFT, where
F is the number of allowed failed nodes, the following equations give the number of required validator nodes and number of required signatures.
F= number of allowable failed nodes
- Validator Nodes =
- Signatures Required =
NEO 1.0 Consensus had the following stages for the happy path:
- PrepareRequest: The validator nodes take turns being the primary node to propose the transactions that should be included in the current block. A simple modulus of the block number is used to determine the first validator of a block.
- PrepareResponse: the validator nodes that receive the PrepareRequest ensure all the transactions in it are valid and respond with their signature in a PrepareResponse message.
2F+1PrepareResponses have been received the new block is created with the signatures.
If the required number of signatures are not obtained within the current timeout, then nodes will send a ChangeView message to select the previous block’s primary validator to be the primary. The timeout is initially 2x the block time, and doubles with each timeout that occurs until a block is generated. Once
2F + 1 change view messages are received, the process will repeat again from step 1.
NEO’s dBFT 1.0 algorithm was susceptible to a single block fork in rare cases of network latency. This problem has been known for quite some time and has been documented (such as in this article). The forked block could occur because nodes were allowed to timeout after having sent a PrepareResponse message. Since the CN node clocks would never be (and never will be) 100% synchronized, nodes would timeout and move to the next validator at slightly different times. If all but one validator didn’t timeout, and that validator had already received
2F PrepareResponse messages, it would generate a valid signed block, while the others would have switched to the next primary, where they could come to consensus and sign yet another block at the same height. At this point, the consensus nodes would only build on one of the forks; so the problem was sometimes termed a block spork. While this scenario could occur without stalling consensus, many of the Neo full network nodes could potentially accept the forked block and become stalled, leading to operational issues with the network nodes that ultimately are relied upon by the end users.
dBFT 2.0 fixes this problem by adding a commitment phase similar to that described in the pBFT (Practical Byzantine Fault Tolerance) document. In order to prevent network stalls, dBFT 2.0 also adds a recovery message implementation into the consensus protocol. The recovery mechanism has the added benefit of improving block times in the case of various operational issues with the consensus nodes. Performance is improved for situations such as:
- Poor network connectivity such as due to a network outage or network attack targeted at a consensus node.
- Consensus node process restart or system restart due to hardware failure, power outage, or other system issue.
The dBFT 2.0 implementation provides visibility to audit any misbehavior of consensus nodes. CNs (Consensus nodes) keep track of all the commitments that have occurred and do not allow any other CN to be able to commit to sign more than one potential block at a given height. The CN consensus logs make this information readily available to both CN node operators as well as any full nodes that enable consensus watch only mode.
Testing and Quality Assurance
In order to ensure quality for the dBFT 2.0 implementation, code changes went through multiple phases of testing. NEO Core Developers performed extensive testing in private networks simulating network failures using NEO’s P2P plugin. Similar automated and manual testing was also performed in private networks managed by NGD. Finally, the code was tested on the NEO public TestNet.
With Neo 2.10.2 now running on the MainNet consensus nodes since June 3rd, the improvements of dBFT 2.0 can be seen in practice today in the form of reduced block times as can be seen in the graphs from Figure 1 and Figure 2 below. Additionally, with this upgrade Neo consensus nodes are now also running the code that improves the memory pool performance, which further reduces the operational burden on the network and ensures block times can stay minimized during higher transaction volume on the network.
Adoption and Use at the point-of-sale
NEO dBFT 2.0 was developed as one of the improvements of the NEO 3.0 initiative, that has been made available ahead of time on NEO 2.x. Now that dBFT 2.0 is in production on MainNet the vast majority of operational issues that businesses once faced when adopting NEO in production are no longer an issue. With this blocker to adoption gone, the NEO network’s use for point-of-sale transactions is poised to flourish.
In order to use a cryptocurrency most effectively at the point of sale it is necessary to have both high availability and a short time to finality. Even with second layer solutions such as payment channels that make near instant point of sale transactions possible, it is important to be able to open a channel in a short time in the case one is not already established. With the improvements from dBFT 2.0 consensus, NEO should now posses the reliability necessary for more businesses to begin using the NEO blockchain for point-of-sale solutions.