Deep Dive Into A New, High Throughput Consensus Algorithm For Permissioned Blockchains

Horia
Official Amulet
Published in
9 min readMay 27, 2018

Blockchain technology has been criticized for low transaction throughput, without which no blockchain implementation can service any mass market product with large numbers of concurrent users. Individuals are accustomed to instantly purchasing a good or a service with cash or credit card; no fully decentralized blockchain implementation currently supports that kind of real time financial transaction.

Notable high-throughput networks such as Ripple and IOTA have centralization: privileged nodes which act as masters to the unprivileged public nodes. Those master nodes present a security vulnerability as they can be hacked, or subpoenaed by state actors.

This blog reviews a new consensus algorithm, Slush, which serves fully permissioned and fully decentralized blockchain implementations. Such a blockchain is a network without any single points of failure (as there is no centralized control) and all nodes are assumed to be faithfully executing the network protocol correctly; otherwise they would not have been granted permission to join and to remain in the network.

Slush is actually the foundation of a family of blockchain consensus algorithms, derived and originally published by an anonymous group calling themselves Team Rocket. Despite its limitation to fully permissioned blockchains, Slush provides the reader with a wealth of insight into decentralized algorithms, and the statistical methods employed to analyze their correctness and their performance. This algorithm will therefore be the subject of the remainder of this blog.

Note, however, that Team Rocket has extended this algorithm in three different ways, to create three other blockchain consensus algorithms — for entirely public, and entirely decentralized blockchains. These three extensions have been empirically shown to attain a throughput of a couple thousand transactions per second, with only a few milliseconds of latency.

What Is Slush? How Does It Work?

Like most blockchain implementations, all nodes store their own copy of the ledger. Unlike most blockchain consensus algorithms, Slush employs an epidemic or gossip style of communication to spread updates throughout the system. Each node in the network faithfully executes the following algorithm to definitively approve or reject any incoming transaction.

The intuition for this consensus algorithm is found in the general framework of recursion. The recursive base case is when a node has already made a definitive decision regarding this transaction. The recursive step is when a node has not made a definitive decision, and it asks a partition or a subset of remaining nodes what their decision was, then takes the majority vote as its own decision.

Implementation Detail
In the flowchart or process chart above, the node which tentatively approves the transaction in absence of a definitive decision only replies with approval upon being queried by other nodes in the blockchain network. It does not reply to the client with this tentative approval. It only replies with tentative approval to other nodes in the blockchain network to prevent an infinite recursion when none of the nodes have any definitive decision on this particular transaction. Once consensus has been reached on the blockchain, the client is certain to have received the correct reply as to whether or not their digitally signed transaction was definitively approved.

Number Of Communication Requests Required To Achieve Consensus

Slush defines consensus the same as all other blockchain consensus algorithms: agreement on the ledger, of which all nodes maintain a local copy. Therefore we are interested in the number of communication requests among nodes required for either all nodes to approve this transaction, or for all nodes to reject this transaction.

Most blockchain consensus algorithms require as many communication requests among nodes as the square of the total N nodes in the network. Slush instead achieves consensus in N log N communication requests among the total N nodes in the network.

Comparison Of This New Consensus Algorithm With Current Algorithms
Consider the following hypothetical blockchain network sizes, and their respective numbers of communication requests among nodes required to reach consensus. Clearly, higher throughput will be readily achieved with this new and much more scalable consensus algorithm.

But how do we know that Slush is guaranteed to achieve consensus? And how do we know that it does so with N log N communication requests among the total N nodes in the network?

It turns out that proving the scalability of the number of communication requests among nodes also proves that the algorithm achieves consensus.

Why Does Slush Converge? How Fast Does It Converge?

This analysis will proceed using a branch of statistics known as stochastic processes. Put plainly, a stochastic process is a sequence of random variables where each variable in the sequence is dependent on the observed values of the previous variables. Think of it like time series analysis on a person performing physical exercise: their heart rate is not only a function of the intensity of the activity, but is also a function of their heart rate in the past (including periods of inactivity).

Within the Slush consensus algorithm, the number of nodes which approve or reject a particular transaction at a particular instant in time t is dependent on the number of approving or rejecting nodes for that transaction at time t — Δt. Where Δt can be an arbitrarily small difference in time for a fully decentralized and fully asynchronous algorithm such as Slush. Let us walk through a toy example to study this consensus algorithm in detail.

Suppose there are only 9 nodes in the network. Then a client submits a digitally signed transaction to an arbitrary node in the network. At any instant in time t, the number of nodes who have approved this transaction and used it to update their own copy of the ledger is anywhere between 0 and 9. And for sufficiently small Δt, only one node has received a complete response for its query — due to heterogeneous node hardware, heterogeneous network topology, non-uniform placements of nodes, etc. Then the number of approving nodes can either remain unchanged, or can change by exactly 1 with every Δt. An intuitive diagram of all possible states for the number of nodes approving this transaction is simply

where each of the gray circles represents state S, whose subscript indicates the number of nodes which have approved this transaction and used it to update their internal ledger. Directional lines indicate a transition from one state to another, with R indicating the probability of remaining in the same state. P indicates the probability of transitioning to a state where one additional node has approved this transaction; equivalently, one less node has rejected this transaction. Q indicates the probability of transitioning to a state where one less node has approved this transaction; equivalently, one additional node has rejected this transaction.

The consensus algorithm completes once either of all 9 of the nodes in the network definitively approve this transaction, or once all 9 nodes in the network definitively reject this transaction. The states at the two end-points of the diagram above are therefore called absorbing states, because the number of approving nodes for this transaction can no longer transition once it has entered either of these two absorbing states.

With sufficiently small Δt such that only one node in this blockchain network has received a complete response for its query, the number of communication requests among nodes can be analyzed using a fundamental tool of stochastic processes known as first-step analysis.

First-step analysis exploits the fact that transitioning from one state to another requires the same expected amount of steps as transitioning from the immediately following state to the desired “other state,” plus 1 to account for the transition that just happened. It is therefore a particular instantiation of the mathematical tool commonly known as proof by induction.

Expected Number Of Communication Requests To Approve Or Reject A Transaction

Using the immediately preceding state diagram, let U denote the expected number of steps to reach either of the two absorbing states; and note that reaching one precludes ever reaching the other. Let the subscripts on U indicate the current state in the diagram from which we are transitioning to either of the two absorbing states. Then the expected number of communication requests to approve or reject a transaction is derived using first-step analysis

which may look daunting but is actually guaranteed to have a solution because there are as many unknowns as there are equations — eight values of U and eight equalities. In general, for a network with N total nodes, there will be N-1 unknowns and N-1 equalities so there will always be a solution for arbitrary sized blockchain networks.

Intuitively, the equations above say that the blockchain network whose consensus is governed by this particular algorithm is guaranteed to reach consensus — because the system of equations has a solution which means the state of the network will eventually be either unanimous approval or unanimous rejection. These equations also say that transitioning from one state to the next costs exactly k communication requests, where k is manually set at the protocol level and its function is detailed in the flowchart diagram of the per node algorithm that must be faithfully executed asynchronously by every node in this network.

When dealing with a stochastic processes, it is always necessary to question whether the eventuality of reaching an absorbing state will take a finite amount of time or steps. It is a subtle distinction, but guaranteeing that the equations have a solution only guarantees that an absorbing state will be reached within a (possibly infinite) amount of time or number of steps. Team Rocket, the authors of Slush and its three extensions, therefore present proofs that this eventuality does indeed occur in finite time and therefore finite number of steps.

What are the exact quantities of the transition probabilities?
In order to understand the proofs of consensus in finite time and finite steps, it is immensely helpful to understand how the transition probabilities are defined. Team Rocket does not explain or derive these transition probabilities. Therefore their proofs can be difficult to reproduce without first stepping back and looking at this core component of all their equations and inequalities.

Within a sufficiently small Δt such that only one node in this blockchain network has received a complete response for its query, this single node can switch its rejection of a digitally signed transaction to an approval with probability P or it can switch its approval of the same digitally signed transaction to a rejection with probability Q, where

and the only other option this single node has is to remain with its current decision of approval or rejection with probability R, where

and where the number of nodes in the network which currently approve this transaction is indicated by the subscript i. This is relevant because the probability that this single node is one which rejects the transaction is (N — i) / N whereas the probability that this single node is one which approves the transaction is i / N for a network with N total nodes. This stems from the a priori assumption that any node in the blockchain network is equally likely to be the one which receives a complete response to its query within sufficiently small Δt.

Slush consensus algorithm specifies precisely in what conditions peers tell the current node to approve this transaction: when more than half of them have themselves approved this transaction. The transition probabilities are therefore equal to

where the probability that querying k out of the total N nodes in the network results in c nodes approving this transaction follows a Hypergeometric distribution because at the instant the query is made, no two nodes are identical, and the total number of nodes which have approved this transaction is i. The probability that more than half of queried peers approve of this transaction is therefore given by

and the probability that half or less of queried peers approve this transaction is its complement: computed by subtracting the above sum from 1.

With this depth of understanding, the reader is now encouraged to study the original paper by Team Rocket.

Originally published at Hacker News on May 27, 2018.

--

--