Understanding the Zilliqa Consensus Mechanism | Part-1

Vikram
7 min readFeb 7, 2022

--

After hearing the term Consensus, the first thing that will come to many of our minds is either Bitcoin or Proof of Work. Consensus mechanism or algorithm is a big area and is being studied for decades but thanks to Satoshi Nakamoto and the Bitcoin network which have made it more popular recently.

In this article series, our goal will be to understand the Zilliqa Consensus Mechanism that is used in the Zilliqa blockchain network. We will be dividing this series into two parts. In this first part, we will build some basic understanding concepts of consensus, distributed systems, and some important properties around it. In the second part, we will look into some consensus algorithms, challenges they face, and then will look into the Zilliqa consensus algorithms and how it overcomes these challenges. So let’s start with the first part.

1. What is a Consensus Algorithm?

The term consensus is not some tech jargon but a general language word that means `A general agreement` among a consensus group. Now you will ask, what is this consensus group? It can be a group of people, machines, aliens, organizations, etc that want to come to an agreement on some topic.

The Consensus algorithm is a vast area of study in computer science. In the context of Blockchain/P2P, the consensus algorithms used are called Distributed Consensus Algorithms where the term distributed comes from the distributed systems. A Distributed system is a system of independently running computer nodes that communicates by message passing and have a common goal to achieve an agreement among all the network participants.

2. Why distributed consensus is hard to achieve?

Distributed consensus can be difficult to achieve and some of the reasons for this could be the crashing of the nodes, delay in the network, no coordination among nodes, malfunctioning of nodes, etc. This problem can be further classified based on Network synchrony and Node failure.

Node Failure

  • Crash fault: Faults where the system stops functioning and becomes unresponsive. These are the genuine kinds of faults.
  • Byzantine faults: Faults where the system malfunctions and shows abnormal behavior. These faults are caused by bad actors.

Network synchrony

  • Synchronous: Networks that have strong coordination and guaranteed message delivery within a fixed time delay ∆.
  • Partially Synchronous: Network that has loose coordination and message delivery is guaranteed eventually but within some variable time delay.
  • Asynchronous: Networks that have no coordination and message delivery are not guaranteed.

The consensus algorithm that deals with countering the byzantine faults are called Byzantine Fault Tolerant Consensus Algorithms or BFT Consensus in short.

3. Byzantine Fault Tolerant Consensus

The consensus algorithm that can tolerate at least one byzantine node and yet achieve consensus is called the BFT consensus algorithm. In order to achieve BFT consensus, the following requirements should satisfy:

  • Termination: All non-faulty node decides on some message and should never halt their functioning.
  • Agreement: All non-faulty nodes must eventually decide on one and the same message.
  • Validity: All non-faulty nodes should only accept the valid and correct message.
  • Integrity: All the agreements achieved on any message should only be proposed by some non-faulty node.

In the context of blockchain, the messages above are nothing but blocks and a set of transactions in the blockchain.

4. Blockchains and Consensus Algorithms

As we all know blockchains are immutable data structures that are replicated all across the nodes of distributed p2p network. All the network participants can be considered as a consensus group whose goal is to achieve consensus on the next block or a set of transactions to be added to the blockchain.

Achieving consensus could be difficult in the blockchain network considering they are public and hold an enormous value of billions and trillions of dollars that may attract a lot of bad actors. These bad actors aka dishonest nodes may try to hijack the network, manipulate chain data, block honest participants, try to double spend and perform many such attacks but the goal of the consensus algorithms is to overcome them all with safety and keep expanding our chain.

In the next sections of this article, we will look at a few consensus mechanisms one by one and study them in detail. Let’s start with pBFT.

4.1 Practical Byzantine Fault Tolerance(pBFT)

pBFT is a famous classical consensus algorithm that was proposed by Castro and Liskov in 1999. It is widely used in distributed applications even before the time Bitcoin came into existence. Recently it has seen a lot of applications in the blockchain space as well, Zilliqa being one of them.

In this section, we will look at the general idea behind the pBFT consensus. Note that Zilliqa uses the modified pBFT but the core working is the same and will help us understand better why Zilliqa pBFT is better.

pBFT is a voting-based, multi-round consensus algorithm that achieves consensus in 3 different phases namely pre-prepare, prepare and commit. The working of the pBFT is as follows:

The consensus starts with a client (that is not part of the consensus group) requesting a primary node in the consensus group. The primary node is like a temporary leader and it could be any node in the consensus group. Note, all nodes have equal capabilities.

In the pre-prepare phase, the primary node broadcasts the request to all the other nodes in the group. In the next phase i.e prepare, all the nodes execute the request and broadcast their respective results to all the other nodes. In the commit phase, all the nodes validate the response they receive from other nodes. Each consensus node waits for valid prepare responses from more than 2/3 of the total nodes before it can finally commit the result. Additionally, nodes also look for a valid commit from more than 2/3 of the other nodes.

pBFT can tolerate f byzantine faults, where f is a number strictly less than one-third of the total nodes i.e the following equation must follow

N >= 3f + 1  // N: total nodes in consensus group, f: max byz Faults

pBFT does the consensus work very well within a small group but doesn’t scale well as the network size grows. One of the major reasons for this is the heavy message passing done among the nodes in the group that has a complexity of O(n²). Zilliqa’s pBFT solves this problem very well but more on this later, now let’s move onto the next consensus algorithm i.e Nakamoto Consensus.

4.2 Nakamoto Consensus (Bitcoin PoW)

Nakamoto Consensus or Proof Of Work is no doubt one of the very famous consensus mechanisms of all, if you have heard about bitcoin or mining then definitely you might have heard of PoW. In this section, I assume that you are very well familiar with basic concepts like hash function, digital signatures, merkel tree, etc and won’t dig into it.

When we talk about PoW consensus, what comes to our mind is a difficult math puzzle, heavy computation, and use some cryptographic technique but believe me, the longest chain rule in bitcoin is what represents the network consensus. The consensus in bitcoin doesn’t have instant finality but is probabilistic and is calculated to be around 6+ blocks.

Confused? Np, let’s start again from the basics and try to understand it in more detail.

Bitcoins longest chain represent the main chain

As we know in the bitcoin network all the miners run the same piece of code. Each miner tries to solve a heavy maths puzzle and the one to solve it gets the chance to propose a block to the entire network and earn mining rewards. Now there can be multiple miners that may solve this puzzle simultaneously and can propose their own version of the block to be included in the main chain. The block that propagates faster and to most miners in the network has a high probability to be included in the main chain.

Miner propagating the block to the network

Meanwhile, the other node that receives multiple blocks simultaneously from the network forms a branch/fork at its end and starts mining on any one of them(most preferably the longest one). As more and more blocks get mined, the branches may get longer and the longest one is assumed to be the main chain. The miner starts mining on the longest chain that is its own version of the main chain.

The bitcoin PoW is so well designed that the network eventually agrees on one longest chain without getting into heavy forks. Let’s understand it in more detail. We know that a shorter block time will create more blocks, more blocks will create more branches, and more branches make it difficult to agree on one longest chain. So satoshi smartly kept the average block time around 10 Min, this is not some random number but backed by some good probability calculation. The difficulty in the network is dynamically adjusted every 2016 block so as to keep the block time average of 10 Min. Now what this does is that it reduces the probability that multiple miners will mine blocks simultaneously hence causing fewer forks to happen.

Blockchain and traditional financial network scalability comparison

That was it for Part1, hope you all liked it. In the next part, we see a few more consensus mechanisms and then will into the Zilliqa blockchain consensus mechanism. If you have any feedback, suggestions, or inputs, feel free to write me on Twitter @NakamotoVikram

--

--