Synchrony and Timing Assumptions in Consensus Algorithms Used in Proof of Stake Blockchains
Co-authored with Alexis Gauba.
All blockchains and cryptocurrencies require consensus, or agreement on the state of the ledger between all parties in the system. However, achieving consensus is non-trivial. One of the things that makes consensus most difficult to achieve is the underlying network on which the protocol runs. As you can see in this post about the blockchain stack, the blockchain contains more than the just the protocol and its participants. It also includes the links between the participants — the network communication channels themselves — which usually run over the internet.
Unfortunately, the internet doesn’t always work smoothly. Often, it is painstakingly slow. Other times, it stops working altogether. Messages and information can get dropped in transmission and never arrive at their desired destination. Participants can drop out of the network entirely if they crash or lose connection with the network.
The Network Model
Consensus protocols, whether used in cryptocurrencies or more traditional distributed systems, have to take account of these potentially faulty communication channels when they construct their model of the network. Both the traditional distributed systems literature and the literature exploring blockchain consensus must define a setting by which nodes talk to each other, called the network model. This refers to how nodes communicate by sending messages between bidirectional communication channels. In blockchain consensus protocols, these nodes form a peer-to-peer network, meaning that every message is propagated to every node in the network via a gossip protocol.
Different protocols adopt different network models based on the environment in which they are designed to function. In other words, some protocols are designed to work in unreliable networks that drop messages and may cause arbitrary message delay, like the Internet. Other protocols are optimized for extremely reliable channels, like permissioned company intranets. These protocols are said to be operating under differing assumptions of synchrony.
As transactions, signatures, and blocks are passed around in the protocol, they take some time to be delivered from one participant to another. If messages can be arbitrarily delayed or even dropped altogether, the consensus protocol will have to be designed with that in mind.
Given that heavy research efforts and a significant amount of capital are being put behind consensus mechanisms for use in blockchains and cryptocurrencies, it is important to look into the proposed models and explore how their network assumptions might affect their real-world viability.
Types of Network Models
Below we explore some of the main network models and their assumptions:
The synchronous setting assumes that there is a known upper bound on all message delay. That is, all messages must be delivered in some pre-determined amount of time, and all participants in the network know how long it takes for messages to be delivered.
Let’s say that we have a very predictable and reliable network. Let’s imagine that all messages in our reliable network are delivered to their desired destination 10 seconds after they are sent. That means that if some participant sends a message to another participant, they know that the other participant will receive it at most 10 seconds later.
As a result, the synchronous model provides protocols that proceed in lockstep in discrete rounds. All the participants in this model know exactly how long all messages will take to get delivered (say 10 second upper bound). In this case, they can all synchronize in deciding to send messages at the same time T, and then wait until time T+10 seconds at which point they know all messages in the network have been delivered and received. This process constitutes one round and then repeats as participants take into account the new information they have received at time T+10, then sending out new messages which will be received at time T+20. The protocol continues with these discrete rounds.
Though the synchronous setting is the easiest in which to create consensus, some consider the synchronous setting unrealistic in practice for cryptocurrencies due to the internet’s unreliable nature. It is a strong assumption that all messages have some maximum delay to reach their destination, and protocols that will stop working when synchrony is violated are clearly not as robust as a result. Therefore, often protocol designers want to include weaker assumptions about the network, which we explore below.
Semi-Synchrony is similar to full synchrony in that there is a predetermined maximum network delay. The difference is that while in full synchrony participants know exactly how long this network delay is, in semi-synchrony, the message passing time can be modeled by a random variable whose probability distribution is known to participants in the protocol. For further analysis, look here.
- The first is that there is a predetermined maximum network delay, but this network delayed is unknown to the network’s participants. That is, all messages might be delivered within 10 seconds like above, but participants in the system aren’t aware of that; for all they know the upper bound could be 2 years before their messages reach all desired recipients. But, participants do know that all messages will eventually reach their intended recipients.
- The second is that after a certain time called the Global Standardization Time (GST), the network will then become fully synchronous. Until that time, which participants don’t know is coming, no messages may be received.
Consensus protocols in this setting tend to be more robust than their fully synchronous counterparts, but as a tradeoff they are generally more complicated. This is a better representation of the internet but still not perfect. Therefore, we await the holy grail of consensus protocols: complete asynchrony.
The asynchronous setting is the most difficult setting in which to achieve consensus. In this setting, there is no maximum network delay. That is, messages may take infinitely long to reach their desired recipient.
There are a number of famous distributed systems results showing just how difficult it is to achieve consensus in asynchrony, such as the FLP impossibility result. Intuitively speaking, it’s easy to understand the difficulty. If messages may never reach their destination, then how do all nodes in the network communicate adequately to make sure that they’ve come to agreement. However, one advantage of this setting is that it is a good representation of the internet and protocols built in this way are extremely well formed and robust.
Another way of thinking of synchrony was inspired by Bitcoin. This more recent way to think about network timings comes from Professors Rafael Pass and Elaine Shi, in the form of Sleepy Consensus. Bitcoin is permissionless and allows nodes to arbitrarily drop out and rejoin the network. In the traditional literature, this would be seen as a failure of synchrony as a node dropping out could be simulated as all messages to and from them being delayed indefinitely.
Moreover, in their recent work, Rethinking Large Scale Consensus, Pass and Shi show that for Nakamoto consensus (like Bitcoin) protocols to retain security, the block interval must be set to a be a constant factor larger than the network’s maximum delay.
As new consensus models have been developed, each has had to make particular decisions about their network model, keeping in mind safety and liveness as first outlined by Leslie Lamport. Safety properties in distributed systems require that “something undesirable (bad) will not happen,” and different protocols specify the unwanted behavior they avoid. Liveness in the blockchain setting, according to Pass and Shi, can refer to a property by which “valid transactions submitted by an honest user get incorporated into the ledger sufficiently fast.”
Network Model in PoS Consensus
In our Meta-Analysis of Alternative Consensus Protocols, we considered the network model in which certain major PoS platforms operate. We considered Tendermint, Thunderella, Algorand, Dfinity, Ouroboros Genesis, Casper FFG, and Casper TFG. Let’s look into how these algorithms see the networks in which they operate.
Tendermint operates in partial synchrony in the proposal step and asynchrony in rounds once a block is proposed. This is because if ⅓ or more validators become unresponsive, the network halts, meaning that block may never get validated and added to the blockchain.
Thunderella has an asynchronous optimistic fast path paired with a synchronous underlying blockchain, making the protocol ultimately synchronous.
Algorand provides safety under partial synchrony such that after an asynchronous period, the network must be strongly synchronous for a reasonably long period of time again to achieve agreement on the ordering of blocks.
Dfinity is formally proven in the synchronous model, but practically operates in semi-synchrony, with safety and liveness timeout clocks triggered based on received messages. The protocol does not depend on a global time or assume synchronized clocks.
Ouroboros Genesis guarantees safety under semi-synchrony with all nodes able to access a global clock setup, meaning that each node can signal the clock that is done with the current round. Once all honest nodes have pinged the clock that they are done with the correct round, it will advance by one tick. Any node can also read from the clock — this time that the clock outputs is known as logical time.
Casper FFG doesn’t explicitly state the synchrony model in which the protocol operates, but dictates that all clients have local clocks that are perfectly synchronized with any discrepancy treated as part of a known communication delay, thus we consider the protocol as synchronous.
Casper TFG (CBC) provides asynchronous safety in that blocks won’t be reverted due to the arbitrary timing of future events and provides liveness given some synchrony assumptions.
There are certain tradeoffs in building protocols with various network model assumptions, and as we’ve seen above, different models serve different needs. However, It’s important to examine these protocols as a whole in deciding which to employ versus just considering one parameter (in this case network model). The viability of blockchain systems to operate in real-world settings, which are not bound by theoretical ideals, is a crucial consideration in developing these systems.
Special thanks to Aparna Krishnan and Peter Lu for discussions and feedback that contributed heavily to this post.