On consensus

Jordan Clifford
Scalar Capital
Published in
9 min readOct 3, 2018

--

Achieving and enforcing consensus is at the heart of what makes cryptocurrencies tick. What does it mean and how is it achieved? Let’s start with a dictionary definition.

Consensus: (noun) general agreement.

This appears simple enough. We can understand consensus as a general or global agreement. Consensus is having everyone on the same page. The opposite of consensus is a state of dispute or disagreement.

Offline Consensus

In computer science and cryptocurrencies, consensus occurs within networks of computers on behalf of their users. Before we dive into how computers achieve consensus, let’s consider consensus in the physical world.

As a society, having consensus makes our lives more efficient. Ownership is one important example. There’s no disputing who owns 123 Main St. in your town. Why? We all defer to a local government that handles recording titles. This is known as a central authority and is a very popular way to maintain consensus. When one entity is in charge of the record keeping, disputes are easy to resolve.

Our financial and legal systems are riddled with centralized authorities doling out consensus. Banks, county recorder’s offices, brokers, payment processors, etc. all act as a source of truth and serve to resolve disputes. This works well enough in developed nations, but it has limitations. The authorities must be trusted and can be manipulated. These intermediaries have a cost of doing business and often opt out of handling very small or risky transactions.

Historical Limitations

With respect to ownership, two critical problems lie within consensus systems reliant on authorities: opacity and lack of immutability. Authorities keep their internal records without public reporting, forcing trust in them. Lacking immutability means there’s nothing to stop the editing of history.

These privileged systems cannot handle all transactions, be easily audited, or used without trust. It is these limitations that long prevented the creation of an internet money that is open to everyone.

Early Distributed Systems

Byzantine Fault Tolerance (BFT)

A robust consensus algorithm must handle misbehaving actors, whether due to malicious intent or a fault in the hardware or software. The problem was formally stated in 1982 by Leslie Lamport, Robert Shostak, and Marshall Pease.

The paper proves that given communication constraints, at least 2/3 of the participants must be honest for the system to converge. To intuitively understand this, imagine if exactly 1/3 of the actors were dishonest. They could collude and tell one half of the honest participants one thing and the other half something different.

Simplified view of a Byzantine agreement failure

It is from this seminal work that the consensus problem is formalized. Consensus in the face of potentially dishonest actors is also known as a Byzantine agreement. It is formalized as having the following:

  1. termination: all processes decide on a value
  2. validity: all processes decide on a value that is valid given the process’s input value
  3. agreement: all processes decide on the same value

Distributed systems algorithms can be categorized into two buckets, synchronous and asynchronous, depending on whether communication happens instantaneously or not. In the real world, unless all participants are colocated, communication requires a message to travel which takes time and is therefore asynchronous. However, if communication is fast and reliable enough, algorithm creators are able to assume synchronous message passing to simplify their design process. Synchronous algorithms assume messages are delivered instantly and with 100% reliability.

All algorithms are forced to balance safety and liveness, two properties of distributed systems. Safety means that all answers are consistent and an incorrect response will not happen. Liveness means that the algorithm is making forward progress towards an answer, and that the answer will eventually be returned.

In 1985, researchers Fischer, Lynch, and Paterson showed that it is impossible for asynchronous consensus algorithms to guarantee both safety and liveness in the presence of even one faulty process.

In 1988, Paxos, a seminal consensus algorithm, was described by Lamport. It was later published as a journal article in 1998. Paxos is uncompromising on safety, and is architected to make it difficult to impede progress. The algorithm is not guaranteed to converge, but is eventually consistent in practice. Paxos remains an important algorithm for many systems in use today. Google famously uses Paxos within its Chubby lock service.

Mining

Until now, consensus algorithms were largely being executed by friendly parties. The algorithms generally assume an overall understanding of whom the participants are. Data structure and propagation were glossed over before, but are now inextricably linked to the construction of cryptocurrencies.

Proof of Work (PoW)

Proof of Work marked a new era in consensus. PoW creates consensus in rounds known as blocks. Consensus participants are called miners or block producers. To create a new block requires solving a mathematical puzzle that is difficult to solve but easy to verify. This puzzle acts as an ongoing lottery for the right to append to the ledger. Proof of Work awards each valid block that is included a block reward, and for the first time, consensus is paid for.

Miners each selfishly work to extend the chain and earn a block reward. Game theory suggests that if a majority of miners are honest, the longest chain will be extended and consensus will drive forward into the future without any participants ever needing to cooperate or trust each other. Rather than passing messages until a consensus is reached, PoW relies on being eventually consistent.

Prior to Proof of Work, consensus was done with a known set of participants. Allowing anyone to contribute to a consensus process left the system vulnerable to a Sybil attack. A Sybil attack is when a single entity creates many identities on the network to overwhelm the system. Consensus algorithms that are open to public participation must have a way to prevent users from overwhelming the process with fake identities. Proof of Work solves this problem by requiring physical resource expenditure to participate.

Pros: Very simple and secure. Novel approach to Sybil resistance allowing open participation

Cons: So far economies of scale have resulted in centralization. Consumes large quantities of physical resources.

Notable Examples: Bitcoin, Ethereum (current), Litecoin, Monero, ZCash

Proof of Stake (PoS)

Building on the concept of proof of work, Proof of Stake aims to be faster, more environmentally friendly, and more amenable to sharding — the division of labor within subgroups of a network. For the privilege of producing a block, rather than solve a mathematical puzzle, block producers vote with their stake on the blocks they produce. In PoS, it is possible to create a schedule in advance resulting in quite fast block generation times.

Two challenges with proof of stake systems are the problem known as nothing-at-stake and the related long-range attack. In a naive implementation of PoS, if a block producer were to witness a chain split, their incentive is to add their block on both chains. This hedges their bets and they will collect their reward no matter which chain endures. Unlike PoW which requires expending energy, there is no opportunity cost to voting on all chains. This may prevent the network from converging on a singular consensus.

Strategies exist for handling the nothing-at-stake problem. Ethereum is pursuing a strategy called slashing which punishes block producers proven to have willfully created a fork by signing two competing versions of history.

The long-range attack tries to reorganize significant chunks of the blockchain to alter transaction history. Provided the nothing-at-stake incentives mentioned hold true, an attacker may be able to succeed in reversing transactions that occurred quite some time ago.

A distinct advantage of Proof of Stake is the ability to create real finality. This is a group enforced guarantee that history will not be changed prior to this checkpoint. A checkpoint is created by having enough block producers vouch for a block, but also query and announce the entire group’s opinion of the block. Blocks are finalized by having a quorum announce and receive confirmation that the group has accepted the block as finalized.

Real finality is important for combining shards. Complexities and consequences of rewrites in history would be amplified across different shards.

Pros: Offers finality enabling sharding possibilities. Can offer fast block times.

Cons: More complex than PoW. Nothing at stake is a theoretical problem.

Notable Examples: Ethereum (future), Decred (hybrid PoW/PoS), Dfinity

Delegated Proof of Stake (dPoS)

Delegated Proof of Stake is a specialized form of PoS. The difference is that the majority of owners in dPoS are expected to delegate their responsibility. By limiting the number of participants, latency becomes less of a problem and consensus speeds up. If the delegates are severely limited in number, premium hardware system requirements may be expected. A limited number of block producers with top notch equipment would allow a network to run at higher throughput.

When throughput is the primary consideration, and some centralization can be accepted, it may be desirable to limit the set of block producers or validators and use a dPoS solution.

Pros: Faster consensus since latency is less of an issue; more throughput

Cons: Less decentralized

Notable Examples: EOS, BitShares, Tron

Proof of Space Time (PoST)

A clever alternative to PoW, is PoST. In PoW, miners expend energy trying to solve a hard mathematical puzzle. PoST awards block rewards to consensus participants at a rate proportional to the storage they have allocated for participation.

One of the most promising advantages of PoST is the possibility that it will remain more decentralized. A problem with PoW algorithms is that they require cutting edge specialized hardware to participate profitably. Application Specific Integrated Circuits (ASICs) are mandatory to mine Bitcoin. Also, it’s necessary to have very cheap or free electricity. This creates centralization in certain geographies and hardware providers. Centralization is a problem for censorship resistance, as it creates points of control that are easy targets for regulators.

The goal for Proof of Space Time is to have regular people allocate their excess hard drive capacity to earning some cryptocurrency. This should keep the number of participants high, avoiding the concentration of power. Hard drives are filled with seed data that is used in combination with passage of time and messages to determine which participant has their block accepted by the network.

It is quite possible that engineers will create specialized efficiencies in Proof of Space Time systems.

Pros: Potentially ASIC resistant. More environmentally friendly than PoW

Cons: Unproven and complex. ASICs or HD farms could diminish benefit

Notable Examples: Chia, SpaceMesh

Alternatives to Mining

Practical Byzantine Fault Tolerance (pBFT)

When parties are familiar with each other and cooperative, it can make sense to abandon PoW or PoS models, and use traditional consensus algorithms. One such algorithm is an off-shoot of the original BFT family of algorithms known as pBFT. pBFT was proposed in 1999 by two MIT researchers. It requires a lot of communication overhead between participants, so it is only practical for small groups.

This algorithm works with a known set of actors and identities must be known to prevent Sybil attacks. Since the participants must be identified, pBFT is not open and permissionless, but rather a permission based system.

Pros: Reliable consensus amongst known parties

Cons: Only supports small number of participants; not a trustless system due to users having to trust validators

Notable Examples: Hyperledger, Ripple, Stellar

Directed Acyclic Graphs (DAGs)

Requiring all participants to come to consensus within a certain period of time limits the throughput of a system. One novel approach is to not require a global consensus on regular intervals. Rather than batch transactions into blocks that are agreed on in a global manner, transactions are individually added to the history.

As transactions are added, they reference prior transactions, and that gives some confidence that the prior transactions are accepted. If enough transactions are chained together on top of a given transaction, it’s increasingly probable that the network will accept the original transaction.

Methods vary for deterring users from creating conflicting histories. Iota has each transaction include a small PoW in attempt to deter spam. It’s not known whether this will work in practice, so for the time being all Iota transactions get stamped by a “coordinator” which leaves the system highly centralized.

Pros: Fast “confirmations” and high throughput

Cons: Unproven in practice. Suspect immutability.

Notable Examples: Iota, Byteball, Nano

Forks (aka Consensus Failure)

Methods for keeping a computer network or a blockchain in agreement can only work if their human users agree on the method of consensus. As early researchers concluded consensus requires termination, validity, and agreement. Key here is validity. Only certain results are allowable via protocol rules.

The rules for validity are often ripe for social disagreement. If two or more factions within a community hold differing opinions on what is valid on their network, then the network cannot be expected to stick together. There will be a divergence in the network. Two high profile cases of social disagreement causing network fragmentation are the creation of Ethereum Classic and Bitcoin Cash.

Conclusion

In order to have a digital currency, it’s necessary for the system to agree on who owns what. A global consensus removes any chance of ambiguity or forgeries. The method chosen to foster the consensus dictates the structure of the network, its throughput and how much trust is built into the system.

The consensus algorithm acts as the technological glue that keeps the system together. Innovations to consensus algorithms can be expected to improve speed, throughput and reduce environmental impact. However, developers must also carefully balance costs of increased complexity against the benefits. Complexity is at odds with robustness in distributed systems, and digital currency is a race for network effect.

Disclaimer: Jordan Clifford is a managing director of Scalar Capital which holds positions in some, but not all, of the aforementioned assets. This post is not investment or legal advice.

--

--

Jordan Clifford
Scalar Capital

co-founder @scalarcapital, burner, #bitcoin enthusiast, previously growth eng @coinbase