Forget about the TPS Competition

Most self-proclaimed TPS simply doesn’t mean anything.

This article is part 2 of “The Making of a consensus protocol” series

1. Why we love Nakamoto Consensus (NC) and how to increase its throughput
2. Why we don’t choose alternative consensus protocols
3. A detailed look into the design of NC-Max, a variant of Nakamoto Consensus with higher throughput.

Some background before you start:

The article is based on the transcripts of Nervos researcher Ren Zhang’s talk at “Scaling Bitcoin” meetup in San Francisco, check here for a video recap of his talk.

Ren is currently a researcher at Nervos (mainly working on the Nervos consensus protocol) and also a PhD student of Bart Preneel (designer of RIPEMD 160, the hash function you use to compute from your Bitcoin public key to your Bitcoin address) at COSIC Research Group KU Leuven (the birthplace for AES, the advanced encryption standard which is used in all of your cell phones). In 2017, after conducting research finding the design flaws in scaling proposed bitcoin unlimited, Ren was invited to intern at Blockstream and worked with Pieter Wullie and Gregory Maxwell. Recently, Ren’s paper “Lay Down the Common Metrics: Evaluating Proof-of-Work Consensus Protocols’ Security” is accepted at IEEESSP (Oakland).

Following the article about Why we love Nakamoto Consensus and how to break its throughput limit, people may raise a very common question: why don’t you choose alternative consensus protocols? Most of them claim to have much higher throughput than NC.

Currently, there are three approaches that don’t adopt Nakamoto Consensus: PoS, DAG, and two block types — the three approaches are not exclusive to each other and it’s possible to use several of them in one consensus protocol. Let’s take a deeper look at their security, functionality, and throughput, and you will know why we didn’t choose them.

Proof of Stake

All PoS protocols introduce some new security ASSUMPTIONS and could trigger new attacks.

For example, Algorand requires that most honest users can send messages that will be received by most other honest users within a known time bound — which means if you hold some Algorand token you’ll have to be ALWAYS online. If you are not online, you are not a valid token holder.

In the case of Ouroboros(used in Cardano), all participants are assumed to have weakly-synchronized clocks and all messages are delivered within a period of time. This is a very strong assumption. There are actually many attacks that could bias the clock of your mining gear, of your public nodes, of your cell phones etc.

Sleepy protocol also requires that participants are equipped with roughly-synchronized clocks.

Violation of these security assumptions can lead to catastrophic results in those PoS protocols. And it also introduces several new attacks that were previous didn’t exist in the proof of work protocols, for example, nothing-at-stake attack, grinding attack, long-range attack etc.

DAG

All DAGs have the transaction order problems

If you allow blocks to produce simultaneously, then different miners or different public nodes will have inconsistent views on total transaction order — you might think those transactions are confirmed, while the others might see a different set of confirmed transactions.

You have two options for solving this problem:

Option 1: settle the transaction order after the blockchain topology is fixed, which means you have to wait a very long confirmation delay.

Option 2: leave the transaction order undecided forever, which means some of the tokens are stuck in the smart contract or the blockchain forever, nobody can spend them anymore, which will limit the smart contract functionalities. Because some of the function calls maybe be benign but you lock the money anyway.

Two block types

What about those protocols with two block types, namely they use a key block which is similar to Nakamoto Consensus block to safely confirm transactions and use micro-blocks that are broadcasted between key blocks to increase the throughput.

These protocols usually have very long confirmation delay similar to DAG protocols. In Bitcoin NG the paper explicitly said that a user who requires high confidence will not gain better latency with Bitcoin NG but must wait for several key blocks to accept the transaction as completed.

The TPS competition

most self-proclaimed TPS simply doesn’t mean anything.

As you may see in the picture above (source: NKN), all these alternative consensus protocols claim to have very high throughput.

Solana claims to be able to process 710,000 txn/s, NKN said that they can have 10,000 nodes with 1M transaction per second, this protocol (as shown in the picture below) claims to have 1,000,000,000,000 TPS with a final confirmation time of under 1 second…if you want to dream, you better dream big :))

However, in the real world, most self-proclaimed TPS doesn’t mean anything, because:

  1. The TPS is simulated under a different network environment. Some of the simulations use network nodes that have 1 Gbit/s bandwidth, which is clearly not the reality.
  2. They are neglecting the real-world factors. NONE of these simulations consider transaction synchronization. For them, the transactions are always already synchronized in their blocks, the purpose of the consensus protocol is only used to order those transactions, which is far from reality! And some simulations assume direct links between committee members, but in reality, every message is broadcasted. Broadcasting messages will consume bandwidth of public nodes.

A Simple Model for TPS Comparison

Here is our model for TPS comparison. We believe that a public node bandwidth is capped by 100% and you can not go beyond that. This bandwidth consists of three parts:

First part is the percentage of the bandwidth used to synchronize transactions that are eventually confirmed, this is the REAL TPS. You need to first synchronize your transaction before confirmation. The second part is the percentage of the bandwidth “wasted” by the consensus protocol. The third part is the underutilized bandwidth.

So the TPS is capped by 100%. If you want to increase the TPS, there are ONLY two things you can do:

1.lower the consensus protocol’s bandwidth consumption

2.lower the underutilized bandwidth consumption

On the Layer 1 consensus protocol level, you can not do more than these and you cannot go beyond 100 percent.

Let’s look back at the alternative consensus protocols’ bandwidth consumption, many of them waste the precious bandwidth on committee member communication.

Algorand stores block certificates in order to prove to new users that a block was committed. Each block certificate is 300 KB, independent of the block size. If you use 1 MB each block, which means around 25% of bandwidth is FOREVER wasted for synchronizing those certificates. So I’m really curious why they claim that their throughput is better than NC, it doesn’t make any sense.

The second way to waste your bandwidth in consensus protocol is that redundant transaction in DAGs & Orphaned blocks. As indicated in this paper, if all nodes choose the same subset of transactions for inclusion in their blocks, and two blocks that are created in parallel are likely to have many collisions, and throughput will not be high. Currently, all of the DAG protocols turned blind eyes to this situation.

I will share more details on the changes Ren proposed to make in NC in the next article, please stay tuned on “the Making of a Consensus Protocol” series part 3 : NC-Max, a variant of Nakamoto Consensus with higher throughput (coming soon!)

part 1: Why we love Nakamoto Consensus (NC) and how to increase its throughput