Breaking down Tangram’s consensus mechanism: Part 1

Distributed consensus algorithm — A walkthrough on BFT & pBFT and more…

Published in

Tangram

8 min readApr 13, 2019

Distributed consensus protocols (algorithms) provide the necessary means for a distributed network or system to agree, verify and validate transactions without the need for a central authority (e.g. Visa, Mastercard and financial institutions).

In Tangram, the network uses a Proof of Stake consensus algorithm, which means it falls under a similar category of consensus protocols used by several other cryptocurrencies. We believe that future cryptocurrencies will need to prove sustainable, with a lower impact on the environment than many current models. A Proof of Stake mechanism offers this advantage among many others.

Generally speaking, a Proof of Stake consensus algorithm works as follows:

Anyone who holds a set amount of the system’s base currency is eligible to become a validator.
A potential validator then issues a special type of transaction that enables them to be considered / chosen / entered as a validator.
The blockchain keeps a record of these validators.
When a transaction or transactions are sent, the consensus algorithm then provides validators the opportunity to participate in validating the transactions.

From an algorithmic standpoint, Tangram uses a Practical Byzantine Fault Tolerance-style (pBFT) Proof of Stake algorithm, whereby, simply put, validators are randomly chosen. However, users will need to offer themselves as validators in order to participate in the consensus arrangement, i.e. to propose or agree on which transactions / block of transactions are correct and should be considered a part of the network. The BFT consensus algorithm will continue to function correctly even if ⅓ of the validators are a) dishonest / malicious / attackers, or b) offline (faulty validators), without deprioritizing safety and liveness (core properties of BFT-style consensus algorithms).

Fundamentally, when network partitions take place, consensus algorithms cannot have consistency and availability (based on the CAP theorem):

Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a (non-error) response — without the guarantee that it contains the most recent write.
Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

This means that if more than ⅓ of nodes are dishonest, faulty, etc., then the consensus algorithm will prioritise SAFETY (consistency) over liveness (availability). To deliver a more robust setup, an algorithm can be modelled which allows for the highest possible availability while prioritizing consistency. After main-net delivery and network proliferation, we will have enough data and resources to research this and will be putting effort into developing and refining such an algorithm.

Practical Byzantine Fault Tolerance (pBFT)

pBFT was first proposed for use in a distributed system in 1999 by Miguel Castro and Barbara Liskov.

Simplified, their proposed consensus algorithm required the following to take place:

Clients (send and receive replies from a network of nodes) send a request to the leader node.
Leader node (primary) multicasts the request from the client(s) to secondary nodes (used to support the consensus algorithm).
Nodes then execute the request sent by the client and reply to the client.
The client wai(maximum amount of nodes that may be fault)+1from different nodes with the same result, which is the result of the operation requested.

Note: Keep reading to see how this consensus algorithm has been adapted and implemented in Tangram.

Three Phase Protocol in context

The consensus algorithm has three primary phases that are a form of state machine replication. Messages are exchanged between nodes depending on their state, with each node maintaining a state per ‘decision’ which is required to be reached, namely a current view number, and a list of input messages and output messages. We will try to explain these in their simplest form.

The pre-prepare and prepare phases are used to totally order requests sent in the same view, even when the primary node which proposes the ordering of requests is faulty. (A primary node is a leader node that handles final consensus decisions and commits blocks to the DAG, among other actions, e.g. publishing). The prepare and commit phases are used to ensure that requests that are committed are totally ordered across network views, i.e. a one-time picture of the network comprising of blocks, agreements, validity on transactions and termination of others.

Side note: All participating validator nodes agree on the total ordering of transactions. This is guaranteed for all nodes.

State Machine

States

Pre-prepare: Initiates the first block to be propagated across the network and broadcasts to other nodes within a membership list.
Prepare: Initiates validity conditions and contents of the message type.
Commit: Decides the ordering of the block within the DAG and verifies all message types received by nodes.

Transitions

Transitions below assume no timeouts have occurred, otherwise if timeouts do occur, due to network delays or malicious activity, two new states are introduced, view-change and new-view. These are introduced under Timeouts, and coupled with the three message types below when broadcasting and receiving messages and their transitions under normal conditions:

Pre-prepare > Prepare: Node who first proposed a new block broadcasts a pre-prepare to all other nodes within their membership.
Prepare > Commit: When a node receives its corresponding pre-prepare, it begins to broadcast prepare, which includes the contents of the pre-prepare and prepare messages.
Pre-prepare > Prepare > Commit: Nodes await prepare messages, and once received with its set conditions and correct views, the node will begin to broadcast commit.
Prepare > Commit: Nodes await commit messages, and once received, the node can then be committed to the position that the nodes agree upon, and continue to participate in the protocol

All of the message types above which are sent or broadcasted will be included in the nodes output message list.

Side Note: All nodes await the message types (pre-prepare prepare and commit).

Timeouts

In order to ensure liveness, the use of logical timers are integrated, such as, when a timeout is triggered as soon as the state machine for the creator of the block (in this case validator A) is initialized for that round. If no decision is reached then validator A will include a view-change message for a new state machine, and initialize a new timeout round. In such cases the following will occur:

At any “point” when nodes are broadcasting or awaiting message types, a node may emit a view-change and increase its view (+1).
If any nodes include a block with the message type of commit, a P variable is included with all commit message types that have the corresponding block.
Nodes cease accepting any new messages that have past views, and cease sending any new messages for past views.
Nodes await a view-change message and include this in their list of input messages.
Nodes will broadcast with a new-view;
Nodes receiving the new-view message type determine whether new-view pertains to any of the blocks which have the properties of the view-change property;
If this is the case, the node(s) then process a pre-prepare for the block in question and;
The states will transition through the normal operation phases as above.

Further readings on Byzantine Clock Synchronization and Time, Clocks, and the Ordering of Events in a Distributed System:

Byzantine Clock Synchronization

Time, Clocks, and the Ordering of Events in a Distributed System

Validity Conditions

When a transaction occurs and is received by a node in the Tangram network, it needs to be valid in order for it to be stored and reach finality. To be valid, it must meet certain conditions:

The transaction must be fully received and stored by the node;
The signature verification must match that of the creator;
The previous block / hash must be valid and its sequence number lower than the position;
The contents of the transaction must be valid (all attributes) e.g.:

Tangram — Coin attributes

Leader vs leaderless pBFT

Most pBFT-based consensus algorithms follow the design of having a constant leader who is a primary node and who accepts all requests while proposing blocks to the rest of the network from other nodes and / or clients. This means that these networks are synchronous, and may cause further challenges such as network communication errors. In addition, a malicious leader may cause hindrance to the network.

In a leaderless based pBFT system, leaders are executed on a first come first see block basis; i.e when a block gets seen in the network it will start a round (see state transition diagram above) with each node in its membership ring, and this continues with each new block, allowing for asynchronous communication. Any client (in Tangram’s case this would be anyone operating through a light-wallet and submitting transactions) may connect to any node and submit their transactions. They may then observe and determine if their transaction was successful, and if for any reason it is not, they may select a different node and re-transmit the transaction.

Side note: Blocks are added to the memPool in order, whereby the Proof-of-Stake protocol will proceed and finalise.

Node Behaviour

The consensus algorithm is inextricably interwound with coin properties and attributes, and these are used during the consensus process. Therefore properties and attributes of transfers / transactions are considered an instance of consensus in Tangram’s network.

These transactions are delivered as an immutable and deterministic total ordering of transactions. It should be noted that like other Byzantine Fault Tolerant consensus algorithms; if a byzantine node is to either a) delay, or b) never send information (honest or dishonest) to other nodes, eventually the information that is sent from other honest nodes will be delivered throughout the network through re-transmissions. This works as follows:

Honest nodes monitor the chains of other nodes.
If they do not locate their blocks within other nodes’ chains, they may re-transmit them.
If the honest node that is monitoring sees any missing information, they may request it from all other nodes.

In Summary

We can summarize the above by providing five core properties of the consensus:

Leaderless.
Dynamic membership.
Efficient for high numbers of transactions .
Network operations do not depend on the quorum slice or node stake.
Flexible in its design and integration with higher level protocols (i.e. Proof of Stake).

For detailed and technical definitions on the consensus algorithm, please read more here (Blockmania: from Block DAGs to Consensus DRAFT (v0.5, 25 Sept 2018) George Danezis and David Hrycyszyn):