Security and Scalability in Committee-Based Blockchain Sharding

8 min readDec 26, 2019

This post aims to analyze the security and scalability tug-of-war of blockchain sharding in an approachable way. The necessity of committee-based sharding is discussed, along with potential pitfalls and tradeoffs in the sharding design space.

The Blockchain Scalability Trilemma

The scalability trilemma claims that we can’t create a blockchain that has 1) a low cost to run a full node, and 2) a high transaction throughput, and 3) while remaining secure. It’s easy to see that, naively, simply increasing the blocksize makes it more expensive to run a full node. The converse is also true: a blockchain can support a higher transaction throughput while remaining secure if we simply increase the target full node requirements.

A multitude of blockchains have been launched or are being developed on the premise of some new consensus protocol (many spawned from academics who don’t understand the scalability trilemma, giving these chains the colloquial classification of “professor coins”). These new consensus protocols claim to support many more transactions per second than Nakamoto Consensus, but not once have I seen these protocols compared to Nakamoto Consensus on the same hardware, network, and adversarial power. Invariably, they all benchmark their consensus protocols on powerful cloud servers with huge network capacities and find that they support much more than the paltry 3–7 TPS offered by Bitcoin. In other words, these chains focus on vertical scalability rather than the horizontal scalability of sharding.

The takeaway of the above is that consensus is mostly orthogonal to scalability (i.e. transaction throughput). The situation with professor coins is probably due to the unfortunate naming collision, where scalability in distributed consensus protocols can mean latency or number of participants, rather than throughput as is the case in the context of blockchains. So long as every full node in the network must validate every transaction, the bottleneck is execution of transactions. Different consensus protocols may offer other nice features, such as lower latency to first confirmation and faster finality, but don’t impact scalability. If we can’t solve the scalability trilemma with a shiny new consensus protocol…how can we?

An addendum to this is that the number of consensus nodes, commonly referred to as “decentralization” does not on its own affect scalability. Point 1 of the scalability trilemma, commonly but incorrectly referred to as “decentralization,” relates to the cost to run a full node, not the number of consensus nodes. As an example, EOS is commonly said to have high scalability because it is centralized around 21 validators, but this is incorrect — it is scalable because EOS full nodes have extremely high hardware and networking demands. A consensus protocol supporting a large number of active participants is certainly desirable, but this does not directly affect its scalability (if we ignore consensus protocol overhead, more on that later).

Sharding to the Rescue: Solving the Scalability Trilemma

Blockchain sharding seems to be a promising avenue for solving the scalability trilemma. In sharding, the execution of transactions is not fully replicated across all nodes. This in theory can provide a constant-factor increase in scalability in the number of shards. In theory, of course, because there are a number of caveats! The analysis that follows is mostly targeted at Eth 2.0 due to my familiarity with its lexicon, but should generally apply to all sharded blockchains.

How Sharding Provides Scalability

In this section I’ll provide an intuitive overview of the required features of a secure and scalable sharded blockchain that are formally proven in the paper “Divide and Scale: Formalization of Distributed Ledger Sharding Protocols.” The ensuing Twitter discussion between Buterin and the author of the paper is recommended reading.

I said above that sharding involves not fully replicating transaction execution. But how exactly is this accomplished? If we had every node validating every shard chain, then that wouldn’t be sharding — it would have the same scalability profile of a blockchain with big blocks of the same size as all the shards put together. If we allowed every node to select which shard they are responsible for validating at every block, then a single shard could get corrupted easily by even a weak adversary. This would allow a state safety violation (such as the invalid printing of coins) to happen on one shard, then later affect all other shards.

The solution is that validators must be shuffled, or rotated, into committees, each committee comprising a subset of the total validators. This shuffling, and the responsibilities given to each validator, must be known by the system so as to be able to assign blame and levy penalties in the event of provable misbehavior (which unfortunately precludes the use of VRFs). Leaving implementation details aside, validators in a single committee produce blocks on a single shard and attest to their validity for some period of time before being shuffled to a new committee on a (potentially) different shard.

Potential pitfall 1: if validators need to catch up to the tip of the shard they’re assigned to by downloading and executing all shard blocks since the last time they were assigned to that shard, then sharding would provide no scalability and would essentially be big blocks. This is solved with two features. First, the stateless client concept is used, where only a state root is needed to execute transactions, each of which provides the necessary witnesses into the state database. This prevents storing large state, but ensuring the state root committed to at the tip of the shard chain is correct would still require processing all blocks. Second, an any-trust (i.e. there exist a single honest party) assumption on the validators assigned to the shard previously is used. So long as one of them is honest, they can produce a fraud proof that the committed state root is invalid.

Potential pitfall 2: validators can’t create a fraud proof on a shard block if that shard block is withheld. Since a majority of a shard’s committee can sign off on its validity optimistically, a colluding majority could create an invalid block and withhold its data. The honest validator(s) would then have to ask every other validator globally to download the shard block. Due to speaker-listener fault equivalence, validators can’t be punished for raising a false alarm on data withholding, so this would mean all validators would have to download all shard blocks all the time in adversarial conditions — again, this is essentially big blocks. Alternatively, the system could be made scalable at the cost of security by using an honest majority assumption for all shard committees, but that assumption is unrealistic. The solution to this is data availability proofs, and indeed data availability proofs are the fundamental mathematical primitive without which sharded blockchains could not be simultaneously secure and scalable.

With these potential pitfalls solved, it indeed seems as though “all the research breakthroughs we need for a full implementation of eth2” are behind us and only implementation details need to be ironed out for the deployment of a scalable and secure sharded blockchain that allows full nodes to be ran on Raspberry Pis. This is wrong.

Networking in Sharding: Between Scylla and Charybdis

There’s one caveat that isn’t listed above that, as it turns out, is rather critical. While the general scheme presented does shard creation and validation of blocks at the consensus level, it’s missing something: what goes inside those blocks? And the answer to that is, of course, transactions. Transactions must find their way from users to shard block producers. Here’s the kicker: if all nodes in the network needed to download all transactions (which is the case if transactions are simply broadcast to all nodes as happens in normal blockchain networks), then sharding would provide no scalability on data availability throughput.

So, do we have a way of sending transactions through a distributed network without having to have every node download every transactions? Kind of! Using a gossip protocol such as gossipsub, where every node maintains a local list of the topics (e.g. a shard ID could be a topic) that its direct peers are listening to. With this, transactions can be sent (somewhat) reliably across the network with only nodes interested in those transactions having to download and share them. Problem solved? Not quite, because this opens up the system to exploitation.

The attack is as follows: if an attacker can link each validator’s ID to the IP addresses of their nodes, they can trivially violate the any-trust assumption by simply DoSing validators that don’t collude/refuse to be bribed to attack a shard with a majority-dishonest committee. While the Eth 2.0 protocol itself does not require validator IDs to be linked to IP addresses, the heterogeneous network topology of the gossip network has no privacy, so it’s trivial for an attacker to spread nodes around the network and perform set intersections of traffic to link them. A homogeneous network topology would be secure against this attack but as mentioned above, would result in no scalability as all nodes would download all transactions.

Work on network-level privacy for blockchains that still allows for scalability is in its infancy at best. Given the Ethereum Foundation’s previous stance that network-level privacy is not a priority for Eth 2.0, I don’t except this issue to be tackled earnestly by the researchers any time soon. This is just one of many open research questions, and without a positive answer to it secure and scalable sharded blockchains will never be feasible.

The root cause of this exploit is that Eth 2.0 (and other committee-based sharded blockchains) moves complexity and security away from the consensus layer and onto the p2p network layer. As a result, any security proofs (of which none have been published as of this writing) of the system would be meaningless if the underlying assumptions around the capabilities of the p2p network are unrealistic. Adversarial behavior at the consensus layer can be penalized (e.g. with slashing, or burning of coins), but not at the network layer. Moving the security of your blockchain to the latter is sure to make it brittle in the face of even a weakly adaptive adversary — most definitely not capable of surviving World War 3.

Consensus Overhead

The previous section covers transaction propagation, but there’s another set of data that needs to be passed around the network: attestations. Eth 2.0 seems to support an absurd number of validators — much more than the couple hundred that PBFT-based protocols like Tendermint are limited to due to worst-case quadratic communication overhead. How does it handle this miraculous feat?

It accomplishes this by, again, shifting complexity away from the consensus layer and onto the network layer. The overhead of aggregating signatures naively is intractable, and practical schemes such as Handel rely on explicit linking of validator IDs to IP addresses.

In short, the size of the consensus set can be much larger than in traditional PBFT protocols because attestations (signatures) are aggregated at the p2p layer. This is a process that has high communication complexity, and a protocol that can perform aggregation for the number of validators targeted by Eth 2.0 within the short blocktime constraint in adversarial network conditions has yet to be proposed.

Conclusions

Committee-based sharding requires execution to happen statelessly (which itself has not yet been show to be viable) and fraud + data availability proofs. However, when considering transaction propagation, we see that scalability can only be achieved by pushing the security of the scheme to the p2p networking layer, which is much more brittle against adversaries. Whether we can surpass this hurdle or not remains an open research problem.

Thanks to Mikerah Quintyne-Collins and James Prestwich for comments.