The DLT consensus ecosystem

33 min readJan 3, 2019

The DLT consensus ecosystem

A DLT/Blockchain is an append-only, sequential, chained, data structure replicated over a peer-to-peer network, where transactions are stored and grouped to form new blocks.

Participants to the networks (peers) achieve distributed consensus on the validity of the transaction and on the ordering.

Blocks are data structure comprising transactions and header that includes a link to the previous block through a hash.

More generally nowadays blockchain networks are part of a wider family named Distributed Ledger Network or DLT.

There are several implementations of DTLs and some of them share the same consensus mechanism, the engine that allows such solutions to be temper-resistant.

In this article I want to give a high level but complete overview of the various consensus mechanisms running within DLTs and of the advantages/disadvantages of each of them, without anyway the aim to provide a ranking

Byzantine Fault Tolerance

Byzantine Fault Tolerance is the dependability of a fault-tolerant computer system, particularly distributed computing systems, where components may fail and there is imperfect information on whether a component has failed.

The term is derived from the Byzantine Generals’ Problem, where actors must agree on a concerted strategy to avoid catastrophic system failure, but some of the actors are unreliable [Wikipedia].

This concept is key in a DLT/Blockchain as in such a distributed ecosystem rogue or unreliable nodes could cause disruption or complete failure in the system in case their decisions/proposals for new transactions/blocks are included in the underlying data structure (ledger).

It is therefore important to identify and isolate such nodes in order to keep consistency and reliability in the stored transactions, and this task is generally achieved through a consensus mechanism.

Consensus in distributed computing

Consensus is a well-known problem of distributed computing. It consists in achieving an agreement among a distributed number of processes [2].

Consensus in DLT is implemented with the aim to achieve fault-tolerant systems though the involvement of multiple nodes (voting nodes) that must agree on proposed transactions or in a specific outcome (e.g. in the result of an execution of a smart contract). Once they reach a common decision, that decision is considered final and cannot be reversed.

In general consensus algorithms can reach a decision when majority of their nodes is capable of taking part of the decision process; for example, a DLT running with N nodes can continue to operate even if N/2–1 servers fail.

More specifically, in a DLT each node has a state machine and a log: each state machine takes as input, commands from its log and thanks to the consensus, each state machine processes the same series of commands and thus produces the same series of results and arrives at the same series of states for each step.

It is thanks to the consensus mechanism, together with the chained structure, that the key property of a ledger i.e. its tamper-resistance is enforced, as transactions or smart contract execution results, once agreed and therefore recorded into the ledger, cannot be changed without such change being evident to participating nodes, whatever would be the reason of such event.

Consistency Availability Partition-tolerance — CAP

In theoretical computer science, the CAP theorem, also named Brewer’s theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees [Wikipedia]:

Consistency ©: Every read receives the most recent write or an error
Availability (A): Every request receives a (non-error) response — without the guarantee that it contains the most recent write
Partition tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

In the paper” PBFT vs Proof-of-Authority: Applying the CAP Theorem to Permissioned Blockchain” CAP theorem is applied to blockchain/DLT as follows:[1]

Consistency. A blockchain achieves consistency when forks are avoided. This property is referred to as consensus finality, which, in the standard distributed system corresponds to achieving the total order and agreement properties of atomic broadcast.

Availability. A blockchain is available if transactions submitted by clients are served and eventually committed, i.e. permanently added to the chain.

Partition Tolerance. When a network partition occurs, authorities are divided into disjointed groups in such a way that nodes in different groups cannot communicate each other.

It is responsibility of consensus mechanism to enforce CAP properties in order to guarantee the proper functioning of the overall system.

Permissioned/Private DLTs

As the number of consensus mechanisms is steadily growing we start in this section with the type of consensus mechanisms that are mainly targeting private/permissioned implementations.

These environments require generally different types of consensus algorithms vs permissioned environments where nodes and/or users have to be allowed to join the network by a gatekeeper.

The algorithms described in this section are the following:

Practical Byzantine Fault Tolerance — PBFT
Democratic Byzantine Fault Tolerance — DBFT
Istanbul-BFT
Redundant Byzantine Fault Tolerance — RBFT
delegated Byzantine Fault Tolerance — dBFT
Raft
QuorumChain consensus

BFT

The objective of Byzantine Fault Tolerance is to be able to defend against Byzantine failures, in which components of a system fail with symptoms that prevent some components of the system from reaching agreement among themselves, where such agreement is needed for the correct operation of the system. [11]

There are many algorithms developed to guarantee Byzantine fault tolerance within a distributed service as the one represented by a DLT, some of them are described in the following chapters.

PBFT

The Practical Byzantine Fault Tolerance (PBFT) [3] is one of the most well-established BFT algorithm.

PBFT is a form of state machine replication: the service is modeled as a state machine that is replicated across different nodes in a distributed system.[4]

Each state machine replica maintains the service state and implements the service operations. The node replicas move through a succession of configurations called views. In a view one node is the primary and the others are backups.

In PBFT applied to blockchain, all the nodes within a consensus group are ordered in a sequence, and there is one primary node while the others may be referred to as backup nodes.

In a view ‘v’, the chosen node will be c=v mod R where R is the total number of nodes.

Specifically PBFT rests on three phases (pre-prepare, prepare, and commit) and therefore three rounds of message exchange before reaching agreement.

In pre-prepare phase, the primary announces the next piece of record that the group should agree on. This is achieved by sending a “pre-prepare” message.
After receiving the pre-prepare message, every node in the group validates the correctness and validity of the record and multicasts a “prepare” message to all the other nodes.
After receiving the prepare messages from a large majority, each node multicasts a commit message to the group.
Eventually each node waits for commit messages from a large majority to ensure that a sufficient number of nodes have agreed on the record proposed by the leader.

The pre-prepare and prepare phases of the algorithm guarantee that non-faulty replicas agree on a total order for the requests within a view: pre-prepare emphasize on certainty of sequence number regardless of which view, as long as commit matches the pre-prepare. In few words: ‘prepare’ gives certainty on ordering within a view while ‘pre-prepare’ gives certainty of sequence number regardless of which view, as long as commit matches the pre-prepare.

The commit phase ensures that non-faulty replicas agree on the sequence numbers of requests that commit locally even if they commit in different views at each replica.

PBFT ensures that 3f + 1 nodes can achieve consensus also in presence of f Byzantine nodes: this is proved to be optimal.

The overall process will involve a client sending a request for a new transaction to the primary. At that moment:

The primary multicasts the request to the backup nodes
Replicas execute the request and send a reply to the client
The client waits for replies from different backup nodes with the same result; this is the result of the operation as at most f nodes can be faulty.

In case a PBFT leader would be malicious PBFT implements a view change protocol to replace the malicious leader with another one: if the nodes do not see any progress for a pre-defined timeframe, they can autonomously broadcast the desire to change the leader. If majority of nodes decides that the leader is faulty, then the next leader in a well-known schedule (generally based on round-robin) takes over.

PBFT guarantees finality to the consensus decision as if consensus is achieved on the next block, then such block is final and the agreement is exact. In this case no further confirmations would be needed.

The main drawback of PBFT is the exponentially increasing message count as nodes (replicas technically) are added to the set that limits greatly its usage to small networks where the number of nodes is under control (from 20 nodes the efficiency is greatly compromised, at 50 nodes would be a reasonable limit for any practical implementation).

Democratic Byzantine Fault Tolerance (DBFT)

While most blockchain consensus protocols rely on a correct leader or coordinator to terminate, DBFT algorithm can terminate even when its coordinator is faulty.

The key idea is to allow processes to complete asynchronous rounds as soon as they receive a threshold of messages, instead of having to wait for a message from a coordinator that may be slow.

The resulting decentralization is particularly appealing for blockchains for two reasons:

each node plays a similar role in the execution of the consensus, hence making the decision inherently “democratic”;
decentralization avoids bottlenecks by balancing the load, making the solution scalable.

DBFT is deterministic, assumes partial synchrony, is resilience optimal, time optimal and does not need signatures. [5]

In a typical message-based consensus algorithm a faulty coordinator can dramatically impact the algorithm performance by leveraging the power it has in a round and imposing its value to all.

DBFT uses a ‘weak coordinator’ that does not impose its value. On the one hand, this allows non-faulty processes to decide a value quickly without the help of the coordinator. On the other hand, the coordinator helps the algorithm terminating if non-faulty processes know that they proposed values that might all be decided. Furthermore, having a weak coordinator allows rounds to be executed optimistically without waiting for a specific message.

The nodes communicate by exchanging messages through an asynchronous reliable point-to-point network. “Reliable” means that the network does not lose, duplicate, modify, or create messages. Hence, when a node receives a message, it can identify its sender.

DBFT relies on a reduction from the binary Byzantine consensus Psync to the multivalue consensus and is also time optimal, resilience optimal and does not use classic (strong) coordinator, which means that it does not wait for a particular message. In addition, it finishes in only 4 messages delays in the good case, when all non-faulty processes propose the same value.

A variant of the classical Byzantine consensus problem, called the Validity Predicate-based Byzantine Consensus (denoted VPBC) is considered within the DBFT. Its validity requirement relies on an application-specific valid() predicate that is used by blockchains to indicate whether a value is valid. Assuming that each non-faulty process proposes a valid value, each of them has to decide on a value in such a way that the following properties are satisfied.

VPBC-Termination. Every non-faulty process eventually decides on a value.
VPBC-Agreement. No two non-faulty processes decide on different values.
VPBC-Validity. A decided value is valid, i.e., it satisfies the predefined predicate denoted valid(), and if all non-faulty processes propose the same value v then they decide v.

In the context of blockchains, a proposal is not valid if either it does not contain an appropriate hash of the last block added to the Blockchain or it contains invalid transactions.

DBFT uses a new binary Byzantine consensus algorithm [13] that enables the non-faulty processes to find an agreement on the value proposed.

Istanbul BFT (Byzantine Fault Tolerance)

Istanbul-BFT consensus is inspired by PBFT algorithm and indeed is based on the same three-phase consensus: pre-prepare, prepare and commit.

Differently from PBFT, there is no specific “client” which sends out requests and waits for the results. Instead, all of the validators can be seen as clients. Furthermore, to keep the blockchain progressing, a proposer will be continuously selected in each round to create block proposal for consensus. For each consensus result, it is expected the generation of a verifiable new block rather than a bunch of read/write operations to the file system.

IBFT uses a similar validator voting mechanism as Clique and copy most of the content from Clique EIP. Every epoch transaction resets the validator voting, meaning if an authorization or de-authorization vote is still in progress, that voting process will be terminated.

As PBFT, this consensus can tolerate up to one-third of all block ‘validator’ nodes in the network being faulty, yet still ensure transaction finality.

Nodes which are elected as block ‘validators’ pick a ‘proposer’ node to propose a new block in a consensus round.

To select a proposer two policies are available: round robin and sticky proposer.

Round robin: in a round robin setting, proposer will change in every block and round change.
Sticky proposer: in a sticky proposer setting, propose will change only when a round change happens.

The proposer will then propose a new block proposal and broadcast it along with the PRE-PREPARE message. Upon receiving the PRE-PREPARE message from the proposer, validators enter the state of PRE-PREPARED and then broadcast a PREPARE message.

This step is to make sure all validators are working on the same sequence and the same round.

After receiving the PREPARE message from two-thirds of the network nodes, the validator enters the state of PREPARED and then broadcasts a COMMIT message. This step is to inform its peers that it accepts the proposed block and is going to add the block to the chain.

Lastly, validators wait for two thirds of the COMMIT messages to enter the COMMITTED state and then insert the block to the chain.

Blocks committed using this consensus protocol are final which means no forking. To prevent a faulty node from generating a totally different chain from the main chain, each validator appends the received commit signature to an ‘extradata’ field in the block header thereby making all blocks self-verifiable. [7]

However, the dynamic extraData would cause an issue on block hash calculation. Since the same block from different validators can have different set of commit signatures, the same block can have different block hashes as well. To solve this, IBFT calculates the block hash by excluding the commit signatures part. This solution keeps the block/block hash consistency as well as put the consensus proof in the block header.

Locking mechanism is introduced to resolve safety issues. In general, when a proposer is locked at certain height H with a block B, it can only propose B for height H. On the other hand, when a validator is locked, it can only vote on B for height H.

Traditionally, validators need to be strongly connected in order to reach stable consensus results, which means all validators need to be connected directly to each other; however, in practical network environment, stable and constant p2p connections are hard to achieve. To resolve this issue, Istanbul BFT implements gossip network to overcome this constrain. In a gossip network environment, all validators only need to be weakly connected, which means any two validators are seen connected when either they are directly connected or they are connected with one or more validators in between. Consensus messages will be relayed between validators.

Redundant Byzantine Fault Tolerance (RBFT)

In RBFT, multiple instances of a BFT protocol are executed in parallel. Each instance has a primary replica.

The various primary replicas are all executed on different machines. While all protocol instances order requests, only one instance (called the master instance) effectively executes them. In Hyperledger Hyndi RBFT can be seen as running several instances of Plenum in parallel. Ordered requests from a single instance called the master is used to update the ledger, but the master’s performance in terms of throughput and latency is periodically compared to the average performance of other instances. If the master is found to be degraded, a view change occurs that appoints a different instance to the role of master.

Like PBFT, RBFT needs at least 3f+1 nodes to handle f faulty nodes.

RBFT is intended for open loop systems (such as e.g., Zookeeper or Boxwood asynchronous API), i.e., systems where a client may send multiple requests in parallel without waiting the reception of replies of anterior requests.

Indeed, in a closed loop system, the rate of incoming requests would be conditioned by the rate of the master instance. Said differently, backup instances would never be faster than the master instance. RBFT further implements a fairness mechanism between clients by monitoring the latency of requests, which assures that client requests are fairly processed.

Delegated Byzantine Fault Tolerance (dBFT)

dBFT algorithm ensures security as well as usability. With erroneous nodes in the consensus making no more than (n−1) / 3 , the functionality and stability of the system is guaranteed.

In fact, the total ledger is maintained by bookkeeping nodes while ordinary nodes do not participate in the consensus making. This is to show the entire consensus making procedures.

All consensus nodes are required to maintain a state table to record current consensus status. The data set used for a consensus from its beginning to its end is called a View. If consensus cannot be reached within the current View, a View Change will be required.

dBFT identifies each View with a number v, starting from 0 and it may increase till achieving the consensus and identifies each consensus node with a number, starting from 0, the last node is numbered n − 1.

For each round of consensus making, a node will play speaker of the house while other nodes play congressmen. The speaker’s number p will be determined by the following algorithm: Hypothetically the current block height is h, then 𝑝 = (ℎ − 𝑣) 𝑚𝑜𝑑 𝑛, p’s value range will be 0 ≤ 𝑝 < 𝑛 .

A new block will be generated with each round of consensus, with at least 𝑛 − 𝑓 signatures from bookkeeping nodes. Upon the generation of a block, a new round of consensus making shall begin, resetting v=0.[8]

Set the time intervals of block generation as t, under normal circumstances, the algorithm executes in the following procedures:

A node broadcasts transaction data to the entire network, attached with the sender signature;
All bookkeeping nodes monitors transaction data broadcasting independently and stores the data in its memory respectively;
After the time t, the speaker sends a 𝑃𝑟𝑒𝑝𝑎𝑟𝑒𝑅𝑒𝑞𝑢𝑒𝑠𝑡；
After receiving the proposal, congressmen i send a 𝑃𝑟𝑒𝑝𝑎𝑟𝑒𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 ;
Any node, upon receiving at least 𝑛 — 𝑓 confirmations, reaches a consensus and publishes a full block;
Any node, after receiving the full block, deletes the transaction in question from its memory and begins the next round the consensus;

It is required that, for all the consensus nodes, at least 𝑛 − 𝑓 nodes are in the same original state.

dBFT algorithm provides 𝑓 = (𝑛−1) / 3 fault tolerance to a consensus system that comprises n nodes.[8]

Raft

Raft implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated ledger; the leader is the only node that should mint new blocks.

Raft is a CFT (Crash Fault-Tolerant) consensus because in Raft the leader is assumed to always act correctly (honestly). All the followers blindly replicate the entries proposed by the leader with no questions asked.

Given the leader approach, Raft decomposes the consensus problem into three relatively independent subproblems[7]:

Leader election: a new leader must be chosen when an existing leader fails.
Ledger replication: the leader must accept entries from clients and replicate them across the cluster, forcing the other ledgers to agree with its own
Safety: if any node has updated the ledger in its state machine, then no other server may apply a different entry for the same log index.

During the leader election process, a voting process happens and during such process all nodes assume the role ‘candidate’.. Once majority elects a leader, the node that won the election takes the ‘leader’ role and all other nodes take a ‘follower’ role.

At any given time each node is in one of three states: leader, follower, or candidate.

In normal operation there is exactly one leader and all of the other nodes are followers. Specifically, the leader first replicates a new transaction to the followers without committing it. After the leader receives feedbacks from a majority of followers that have written the entry, the leader notifies the followers that this entry is committed. This process is called Ledger Replication.

Followers are passive: they issue no requests on their own but simply respond to requests from leaders and candidates.

Raft divides time into terms of arbitrary length numbered with consecutive integers. Each term begins with an election, in which one or more candidates attempt to become leader. To start an election process, a follower increments its current term and transitions to candidate state.

If a candidate wins the election, then it serves as leader for the rest of the term. In case an election would result in a split vote, the term will end with no leader; a new term (with a new election) will begin shortly. Raft ensures that there is at most one leader in a given term [3].

To enforce Consistency of the ledger, each node stores a current term number, which increases monotonically over time. Current terms are exchanged whenever nodes communicate; if one server’s current term is smaller than the other’s, then it updates its current term to the larger value.

If a candidate or leader discovers that its term is out of date, it immediately reverts to follower state. If a server receives a request with a stale term number, it rejects the request.

Raft uses randomized election timeouts to ensure that split votes are rare and that they are resolved quickly.

Raft has been designed to guarantee that safety must not depend on timing: the system must not produce incorrect results just because some event happens more quickly or slowly than expected.

However leader election is the aspect where timing is most critical: Raft will be able to elect and maintain a steady leader as long as the system satisfies the following timing requirement:

broadcastTime << electionTimeout << MBTF

Where:

broadcastTime is the average time it takes a server to send RPCs to every server in the net and receive their responses;
electionTimeout is the randomized election timeout i.e the amount of time that a follower needs to wait to become a candidate;
MTBF is the average time between failures for a single node.

It is important to choose properly these parameters to avoid a network split (an event happening if more than half of nodes become out of the current leader’s control).

Increasing election timeout is helpful to lower the network split probability caused by packet loss. It has also being found that that larger network has smaller split probability than the one in small network at the beginning of running time and more focused splitting time [9].

Raft consensus does not mint blocks unless there are pending transactions. This can result in significant storage savings, especially when the transaction load is low, because no empty blocks containing zero transactions are being minted.[10]

Another advantage of using Raft is faster block time compared to IBFT. The leader mints a block within 50ms of receiving the transaction (per the default setting), and flowing a proposed block through the Raft cluster and collecting majority acknowledgements is a very fast process. Under most typical network conditions, average block times are consistently sub-seconds.[10]

QuorumChain Consensus

Quorum is a private/permissioned, blockchain based on the official Go implementation of the Ethereum protocol. Quorum uses a voting based consensus algorithm, and achieves Data Privacy through the introduction of a new “private” transaction identifier.

In Quorum Version 1 a new consensus mechanism has been created, called QuorumChain; implemented in a smart contract which tracks voter and block maker lists, both of which can be maintained through standard transactions, thereby providing further control and clarity over how and by whom the network is managed.

In QuorumChain, a subset of nodes within the network are given the ability to vote on blocks. The voting role allows a node to vote on which block should be the canonical head at a particular height. The most recent block with the most votes will win and is considered the canonical head of the chain.

A block is only considered valid once a given threshold of votes has been received from valid Voters.

Block creation is only allowed by nodes that have been given the Maker role.

A node with this role can create a block and sign it by setting their signature in the ExtraData field of the block. On block import, as part of the block header validation, nodes verify that the block was signed by one of the nodes that have the Maker role by looking up the signer’s address in the list of valid Makers in the voting contract.

In Quorum 2 this algorithm has been removed, we have no extensive feedback on its capabilities, as there has not been wide usage of such technology.

Permissionless/Public DLTs

In permissionless environments any user can join the network both as end user or as node without entry barrier and there are no trust relationships among the nodes and this reduces highly the speed of transaction confirmation. Specific algorithms need to be implemented to handle such ecosystem and the main ones are described in this chapter.

The algorithms described in this chapter are the following:

Proof of Work — PoW
Proof of Elapsed Time — PoET
Proof of Weight — PoWeight
Proof of Stake — PoS
Delegated Proof of Stake — DPoS
Randomized Proof of Stake — RPoS
Leased Proof of Stake — LPoS
Proof of Authority — PoA
Aura

Proof of Work

We’re not be going through PoW protocol process, just at very high level mining process is based on the solution of a complex mathematical problem that makes costly to participate to the network and not advantageous to try to mine fake blocks.

PoW is well known and widespread in several blockchain implementations as in Bitcoin and Ethereum.

The reason anyway why so many other protocols are being developed is due to some key issues now growing with PoW:

Proof of work is an extremely inefficient process because of the sheer amount of power and energy that it requires for running .
People and organizations that can afford faster and more powerful ASICs usually have a better chance of mining than the others.

As a result of this, the main networks as bitcoin are not as decentralized as they are supposed to be.

Proof of Elapsed Time (PoET)

PoET, short for “Proof of Elapsed Time” is a Nakamoto-style consensus algorithm as elects a leader through some form of “lottery”; it was developed by Intel in 2016 as an efficient consensus mechanism primarily for permissioned blockchain networks. PoET is actually the consensus model of choice for Hyperledger Sawtooth’s modular framework.

PoET consensus is an efficient form of proof of work as removes the need for the mining-intensive process replacing it with a randomized timer system for network participants based on Intel’s hyped Software Guard Extensions (SGX).

For the purpose of achieving distributed consensus efficiently, a good lottery function has several characteristics:

Fairness: The function should distribute leader election across the broadest possible population of participants.
Investment: The cost of controlling the leader election process should be proportional to the value gained from it.
Verification: It should be relatively simple for all participants to verify that the leader was legitimately selected.

PoET is designed to achieve these goals using new secure CPU instructions which are becoming widely available in consumer and enterprise processors. PoET uses these features to ensure the safety and randomness of the leader election process without requiring the costly investment of power and specialized hardware inherent in most “proof” algorithms.[11]

In PoET every validator requests a wait time from an enclave (a trusted function). The validator with the shortest wait time for a particular transaction block is elected the leader. One function, say “CreateTimer” creates a timer for a transaction block that is guaranteed to have been created by the enclave. Another function, say “CheckTimer” verifies that the timer was created by the enclave and, if it has expired, creates an attestation that can be used to verify that validator did, in fact, wait the allotted time before claiming the leadership role.

The PoET leader election algorithm meets the criteria for a good lottery algorithm. It randomly distributes leadership election across the entire population of validators with distribution that is similar to what is provided by other lottery algorithms. The probability of election is proportional to the resources contributed (in this case, resources are general purpose processors with a trusted execution environment). An attestation of execution provides information for verifying that the certificate was created within the enclave (and that the validator waited the allotted time). Further, the low cost of participation increases the likelihood that the population of validators will be large, increasing the robustness of the consensus algorithm.[11]

Two mechanisms are put in place to blacklist validators, the first one affects each validator periodically, although infrequently while the second one is an asynchronous revocation check that each validator could perform on other validators at any time.

Proof-of-Weight (PoWeight)

Proof-of-weight consensus mechanisms are based on the Algorand consensus model developed by researchers at the MIT Computer Science & Artificial Intelligence Laboratory. The Algorand protocol aims to facilitate quick transactions by relying on a Byzantine agreement protocol, which also aims to make it capable of scaling to many users.

In a network utilizing proof-of-weight, weighted users play an integral role in the process of achieving consensus. Every user is assigned a certain “weight”, which is relative to a selected value that represents a user’s contribution to the network. In order to prevent double-spending attacks and other foul play on the blockchain, the majority (more than two thirds) of the weighted fraction has to belong to honest users.

Proof of Stake (PoS)

Rather than requiring the prover to perform a certain amount of computational work, a proof of stake system requires the prover to show ownership of a certain amount of money. PoS in practice replaces miners with validators. Different blockchain networks implementat PoS with different technological flavors: in this document we refer to the following blockchain networks PoS implementations:

PeerCoin
Blackcoin
Ethereum (Casper)

The first implementations of PoS were based on the concept of coin-age that was known to Nakamoto at least as early as 2010 and has been used in Bitcoin to help prioritize transactions. Coin age is simply defined as follows: [12]

CoinAge = CurrencyAmount*HoldingPeriod

In Proof-of-stake (PoS) the node which generates a block has to provide a proof that it has access to a certain amount of coins before being accepted by the network. Generating a block involves sending coins to oneself, which proves the ownership. The required amount of coins (also called target) is specified by the network through a difficulty adjustment process similar to Proof of Work that ensures an approximate, constant block time.[13]

In one of the first PoS implementations (PeerCoin), the block generation is based on coin age which is a factor that increases the weight of unspent coins linearly over time; the proof that has to be provided together with a new block and has to satisfy the following condition:

Proof hash < coins-age * target

The proof hash corresponds to the hash of an obfuscation sum that depends on a stake modifier, the unspent output, and the current time.

One of the weaknesses of this system is that it is possible for an attacker to save up enough coin age to become the node with the highest weight on the network. If the attack were to be malicious the attacker could then perform a double-spend. After this is done however, a second double-spend would require the attacker to save up coin age again, as the stake resets when the block was generated (that means that the attacked would either take a lot of time or a lot of coins, and thus money, to make this happen).

Another problem with coin age are greedy honest nodes. These are nodes that have no malicious intent but they keep their coins off the network and only stake every once in a while to get their stake reward. The current system actually encourages abusive behavior of these nodes by keeping their node offline until it accumulates enough coin age to get the reward in a short period of time and then shut down the node again.

There is also a more general concern as the cost of double-spending attack may have been lowered, in some cases where many nodes due to fast transactions would consume rapidly their stake and therefore an attacker may just need to accumulate a non considerable amount of coin age; therefore the utilization of “total consumed coin age” to determine main chain lowers the cost of attack on the entire block chain of history.

Different blockchain implementation try to fix these concerns with different methods: some using a hybrid PoW-PoS model, some taking away the coin-age model (Blackcoin).

Taking as example the future Ethereum Casper protocol (a proposed PoS implementation), the validators will have to lock up a certain amount of their coins as stake. Then when they discover a block to be added to the chain, they will validate it by placing a bet on it. If the block gets appended, then the validators will get a reward proportionate to the proposed bet.

The major issue on Casper is the ‘Nothing at Stake’ issue: in the event of a fork, whether the fork is accidental or malicious attempt to rewrite history the optimal strategy for any miner is to participate (mine) on every chain, so that the miner gets their reward no matter which fork wins. This is mainly because nodes do not lose anything by signing each and every fork as there is no correlated cost (differently from PoW where there would be power and opportunity cost of using power on another branch of the fork).

Thus, assuming a large number of economically interested miners, an attacker may be able to send a transaction in exchange for some digital good (usually another cryptocurrency), receive the good, then start a fork of the blockchain from one block behind the transaction and send the money to themselves instead, and even with 1% of the total stake the attacker’s fork would win because everyone else is mining on both.[14]

To address this, Casper has implemented a process by which they can punish all malicious elements, if a validator acts in a malicious manner and tries to do a “nothing at stake”, they will immediately be penalized and all of their stake is going to get slashed.

This can be solved via two strategies. The first, described in broad terms under the name “Slasher”, penalize validators if they simultaneously create blocks on multiple chains by means of including proof of misbehavior (i.e. two conflicting signed block headers) into the blockchain as a later point in time at which point the malfeasant validator’s deposit is deducted appropriately.

The second strategy is to simply punish validators for creating blocks on the wrong chain. That is, if there are two competing chains, A and B, then if a validator creates a block on B, they get a reward of +R on B, but the block header can be included into A (in Casper this is called a “dunkle”) on A the validator suffers a penalty of -F (possibly F = R).

In general, the main issues in running PoS solutions are:

“Censorship”. The blockchain itself cannot directly tell the difference between “user A tried to send transaction X but it was unfairly censored”, “user A tried to send transaction X but it never got in because the transaction fee was insufficient” and “user A never tried to send transaction X at all”. This would allow censorship of specific transactions.
“Stake grinding”. a class of attack where a validator performs some computation or takes some other step to try to bias the randomness of validator selection in their own favor.
“Liveness denial”: a class of attack where validators stop finalizing blocks.
“Invalid chain finalization”: a class of attack where validators finalize an invalid (or unavailable) block.
“Finality reversion”: a class of attack where validators that already finalized block A then finalize some competing block A’, thereby breaking the blockchain’s finality guarantee
“Weak subjectivity”: it’s the situation where when a set of validators place amounts at stake for a certain period of time (X days), a 51% attack could revert a certain number of transaction right after their Xdays are expired: in this case the stake cannot be used as penalty and validators cannot be punished.

There are several methods that aim to address these attack vectors and are being implemented within Casper’s PoS implementation. Nevertheless attacks against Casper or similar PoS implementations are extremely expensive as such attacks may cost as much, if not more, than the cost of buying enough mining power in a proof of work chain to permanently run a 51% attack.

Delegated Proof-of-Stake (DPoS)

The DPOS algorithm is mainly used within BitShares blockchain. BitShares claims that DPoS allows to generate a new block at fixed rate (block production/confirmation time) with minimal computational requirements.[15]

In DPoS, stakeholders elect what are known as witnesses. Witnesses are responsible and rewarded for generating and adding blocks to the blockchain.

Each account is allowed one vote per share per witness, a process known as approval voting. The top N witnesses by total approval are selected. The number (N) of witnesses is defined such that at least 50% of voting stakeholders believe there is sufficient decentralization. When stakeholders expresses their desired number of witnesses, they must also vote for at least that many witnesses. A stakeholder cannot vote for more decentralization than witnesses for which they actually cast votes.

The voting for witnesses is a continuous process, therefore, witnesses have an incentive to carry out their function to the highest standard or they risk losing their position. A reputation scoring system exists in order to assist stakeholders in better assessing the quality of witnesses.

Each time witnesses produce a block, they are paid for their services. Their pay rate is set by the stakeholders via their elected delegates. If a witness fails to produce a block, then they are not paid, and may be voted out in the future.

Delegates are elected in a manner similar to witnesses. A delegate becomes a co-signer on a special account that has the privilege of proposing changes to the network parameters. This account is known as the genesis account. These parameters include everything from transaction fees, to block sizes, witness pay, and block intervals. After the majority of delegates have approved a proposed change, the stakeholders are granted a 2 week review period during which they may vote out delegates and nullify the proposed changes. Unlike witnesses, delegates are not paid positions..

The genesis account can technically perform any action that any other account can perform, which means it is possible to send funds to the genesis account or specify the genesis account as an escrow agent. The genesis account can also be used to issue new assets.

EOS is another blockchain implementing DPOS, in EOS blocks are produced in the rounds of 21. At the start of every round 21 block witnesses are chosen. Top 20 are automatically chosen while the 21st one is chosen proportional to the number of their votes relative to the other witnesses.

Witnesses are then shuffled around using a pseudorandom number derived from the block time. This is done to ensure that a balance connectivity to all other witnesses is maintained. To ensure that regular block production is maintained and that block time is kept to 3 seconds, witnesses are punished for not participating by being removed from consideration. A witness has to produce at least one block every 24 hours to be in consideration.

The DPOS system is resistant to forks because instead of competing to find blocks, the witnesses will have to co-operate instead. In the event of a fork, the consensus switches automatically to the longest chain.[16]

Randomized Proof-of-Stake (RPoS)

RPoS is a variant of PoS where nodes are selected at random to propose blocks.

Orbs implements RPoS taking the approach of selecting an entire committee — not merely a block leader — to authorize transactions adding a reputation score to decide who is qualified to participate in committees, rather than their total deposit or holdings of tokens. Reputation scores may fluctuate depending on reliability to process transactions in a timely manner, or if a given node violates other codes of conduct.

The block is verified by the randomly selected committee, whose total number of members can be pre-set.

Leased Proof of Stake (LPoS)

LPoS is an modified version of Proof-of-Stake where the concept of mining pools is built-into the consensus mechanism. Waves is a chain with a major LPoS implementation and we use it as reference for this description.

In a regular Proof-of-Stake system, each node that holds a certain amount of cryptocurrency is eligible to add the next block to the blockchain but in the LPoS system, users can lease their balance to full nodes.

With LPoS, the user will have the ability to Lease waves cryptocurrency form the wallet to different contractors which can pay a percentage as a reward. The larger the amount that is leased to a full node, the higher the chances of that full node being selected to produce the next block. If that full node is selected to produce the next block, the leaser will then receive a percentage of the transaction fee that is collected by the full node.

In a Leased Proof-of-Stake environment, users can choose between running a full node or leasing their stake to a full node with receiving rewards. This system allows anyone to participate in the Waves network maintenance.

User can lease his waves coins through leasing on any computer or mobile device that has an internet browser since Waves provides a lite client solution that does not require “Miners”, that are leasing their balance to store the whole Blockchain or to have the wallet running. [17]

Proof-of-Authority (PoA)

Proof of Authority (PoA) is a family of Byzantine fault-tolerant (BFT) consensus algorithms largely used in practice to ensure better performance than traditional Practical Byzantine Fault Tolerance (PBFT) [18].

PoA was proposed in 2017 (the term was coined by Gavin Wood) as a blockchain based on the Ethereum protocol. PoA consensus is based on a modified Proof of Stake model where identity becomes a form of stake rather than actually staking coins or accumulating con-age.

PoA removes a specific concern within the PoS model that although stakes between two parties may be equal, their value to each party may vary significantly depending upon their holdings. For instance, Alice may have 1,000 tokens staked and Bob may also have 1,000 tokens staked, however, Alice has $10 million outside of her stake and Bob only has $10,000 outside of his. Therefore, Bob is much more likely to invested in the success of the network than Alice since his stake represents a substantially larger portion of his overall finances. [19] This naturally applies as well in other consensus models as PoW mining cost brings the same concern.

PoA is mainly targeted to permissioned networks, nevertheless it can be used in hybrid models as it is a reputation based consensus model where reputation can be calculated with several models that do not necessarily involve a gatekeeping process. For this reason is described here in Part3 where we focus on permissionless models (indeed Parity, a client implementation of Ethereum, supports a Proof-of-Authority consensus engine).

In PoA consensus is achieved by referring to a list of authorities (Validators) that are nodes allowed to participate in the consensus, validating both transactions and blocks.

Each authority is identified by a unique id and a majority of them is assumed honest, namely at least N=2 + 1, where N is the number of elected authorities.

PoA operates in a mining (minting) rotation schema (steps or rounds) in order to distribute responsibility. At each round an elected node acts as mining leader and is in charge of proposing new blocks on which distributed consensus is then achieved.

For permissioned DLTs PoA is more secure than PoW as an attacker with unwanted connection or hacked authority can not overwhelm the whole network, less computationally intensive as there’s no mining involved, faster and more predictable as blocks are issued given intervals of time.

Differently from PBFT, PoA requires less message exchanges hence provides better performance.

Clique is a PoA implementation that allows more than one authority to propose blocks with random delays. This permits coping with leaders that could not have sent any block due to either network asynchrony or benign/Byzantine faults.

In Clique resulting forks are anyway resolved by the Ethereum GHOST protocol, hence we have eventual consistency. On the other hand, as the mining frequency of authorities is bounded by 1∕(N/2+1) , a majority of Byzantine authorities is required to take over the blo

Limitation of PoA model is based on its decentralization/distribution model as the number of validators may be not enough to guarantee the level of decentralization required in a true public blockchain implementation. Alo the process of validator entitlement should be carefully defined in a permissionless environment.

A good example may be the POA Network that utilize US public notaries as validators: this is a perfect running environment to avoid several drawbacks of other consensus mechanisms, but a questions raises: does this represent the intrinsic nature of decentralized systems? is this different from the previous existing model? Does it guarantee immutability?

In a real public permissionless ecosystem, validators should be rigorously verified to determine they would act honestly.

A validator may in this case, be verified primarily by:

their public identity (maybe by a SSI model linked to the real identity)
running a process where the validator’s reputation is verified (e.g. by scanning public rating systems etc)
guaranteeing a proper geographic and social distribution

Anyway, whatever process we may choose there is the risk based on the fact that some people/entities do not care about their reputation or the payoff of ruining their reputation is simply more than the cost.

These issues can be mitigated by guaranteeing that the majority of validators is honest:

choosing a large number of validators
running a swift expulsion model for non-reliable validators
taking from PoS model the concept of rewarding honest validators running the service for a long period of time

While PoA runs smoothly in permissioned environments, there is still some work to achieve to make it appropriate for permissionless ones.

Aura

Aura (Authority Round) is one of the Blockchain consensus algorithms available in Parity (one of the main Ethereum clients) [20].

In Aura time is divided into discrete steps of a given duration (constant); the actual step is determined by the simple formula:

StepNumber=UnixTime÷ Duration

At each step a leader, i.e. the only one that can issue a block (one), will be assigned based on the following formula

Leader=StepNumber mod N

Authorities maintain two queues locally, one for transactions Qt and one for pending blocks Qb

At each step the leader propose a new block that is made out of the transactions present in the queue and of a header containing the step number and leader’s signature of the block hash. Leader then broadcasts the block to the authorities (this process is called ‘block proposal’) that can verify it by checking that the signature belongs to the correct primary for the given step.

Any block sent by an authority not being the current leader is rejected. The leader always send a block, even if no transaction is available (in this case the block would be naturally empty) this to reach finality in a timely fashion.

If authorities do not agree on the proposed block a voting is triggered to decide whether the current leader is malicious and then kick it out: this would happen in case majority of votes would go for this option.

Aura critically relies on time synchronization of the validators for successful execution of the protocol [21].

This could bring t Inconsistency of the blockchain’s state or Denial-of-service if clocks of validators are desynchronized due to clock drift or by a specific attack (e.g. leveraging on NTP traffic being unauthenticated by default, thus a network attacker could transmit invalid time to the validators). In this case, different leader validators would be active and therefore compute different blocks a different step index.

Another vector of attack is based on collusion of enough validating nodes. Aura should be able to tolerate up to 50% malicious nodes, therefore

A critical attack vector is base on the possibility of an attacker to block one or more validators to prevent the protocol from validating incoming blocks. Encrypting connections among validators, even if complex, will not completely protect from such kind of attack.

Another possible vector is based on the possibility of a minority of malicious nodes to commit transaction therefore causing no consistency in the system.[18]

References

[1] PBFT vs Proof-of-Authority: Applying the CAP Theorem to Permissioned Blockchain — Stefano De Angelis, Leonardo Aniello, Roberto Baldoni, Federico Lombardi, Andrea Margheri, and Vladimiro Sassone.

[2] C. Cachin, R. Guerraoui, and L. Rodrigues. Introduction to reliable and secure distributed programming. Springer, 2011.

[3] In Search of an Understandable Consensus Algorithm — Diego Ongaro and John Ousterhout — Stanford University

[4] Practical Byzantine Fault Tolerance -Miguel Castro and Barbara Lisk

[5] RBFT: Redundant Byzantine Fault Tolerance — Pierre-Louis Aublin, Sonia Ben Mokhtar, Vivien Que’ma

[6] DBFT: Efficient Leaderless Byzantine Consensus and its Application to Blockchains — Tyler Crain and Vincent Gramoli — Mikel Larrea — Michel Raynal

[7] Istanbul Byzantine Fault Tolerance — GitHub

[8] NEO consensus whitepaper

[9] Performance Analysis of the Raft Consensus Algorithm for Private Blockchains — Dongyan Huang, Xiaoli Ma, Shengli Zhang

[10] Consensus Algorithms: PoA, IBFT or Raft? Jim Zhang

[11] Sawtooth’s Ledger documentation

[12] PPCoin: Peer-to-Peer Crypto-Currency with Proof-of-Stake — Whitepaper

[13] BlackCoin’s Proof-of-Stake Protocol

[14] Ethereum Casper Protocol

[15] The BitShares blockchain whitepaper

[16] EOS whitepaper

[17] Waves LPoS implementation

[18] PBFT vs Proof-of-Authority: Applying the CAP Theorem to Permissioned Blockchain — Stefano De Angelis, Leonardo Aniello, Roberto Baldoni, Federico Lombardi, Andrea Margheri, and Vladimiro Sassone

[19] Blockonomi Jul 2018 — Brian Curran

[20] Aura — Authority Round — Wiki

[21] https://github.com/poanetwork/wiki/wiki/Aura-Consensus-Protocol-Audit

The DLT consensus ecosystem

The DLT consensus ecosystem

Byzantine Fault Tolerance

Consensus in distributed computing

Consistency Availability Partition-tolerance — CAP

Permissioned/Private DLTs

BFT

PBFT

Democratic Byzantine Fault Tolerance (DBFT)

Istanbul BFT (Byzantine Fault Tolerance)

Redundant Byzantine Fault Tolerance (RBFT)

Delegated Byzantine Fault Tolerance (dBFT)

Raft

QuorumChain Consensus

Permissionless/Public DLTs

Proof of Work

Proof of Elapsed Time (PoET)

Proof-of-Weight (PoWeight)

Proof of Stake (PoS)

Delegated Proof-of-Stake (DPoS)

Randomized Proof-of-Stake (RPoS)

Leased Proof of Stake (LPoS)

Proof-of-Authority (PoA)

Aura

References

Written by Pietro Marchionni