Sharding Is Also NP-Complete

Published in
4 min readApr 3, 2022

--

This is a quick follow-up to a post on blockchain scalability based on a few discussions where people asked about sharding. Sharding is also NP-Complete. So here’s what we are going to do.

1. Define a general type of sharding that covers any such scheme
2. Show it is NP-Complete
3. Discuss what that means

The Sharding Problem

The general “sharding problem” is how to ensure both sides of a transaction are on the same shard. That’s because transactions settle one blockchain at a time. Proving on a different chain that you did some transaction is fine — but you still need to settle it. And per-chain settlement bandwidth is limited by the previous post.

We have s shards, each of which can settle r trades per unit time. Each user has a starting balance b_i. And we have a list of t trades (party_a, party_b, amount) where party_a pays amount tokens to party_b. A sharding is an assignment of trades and starting balances to shards. Users can partition their b_i across shards, but the each balance must be non-negative and the totals correct.

Earlier I said this would be a “general type of sharding” so let’s be clear about that. There are a bunch of shards. We make no assumptions about how the shards are connected or any use of proofs or anything like that. Our sharding coordinator can instantly move user’s balances across shards. It can also instantly assign trades to shards. The only thing that takes any time for this coordinator is deciding which balances and trades go where at the start of a block. Once it makes that decision the system runs. This should cover anything properly called a “sharding.”

The Reduction

Finding an efficient sharding is NP-Complete. How do we know that? We can use the same reduction as last time. There are 4 transactions between the variables at the top and the gather wallet. Take an instance of 3-SAT. Reduce it. Set up your system with r=4 and 3 shards per clause. Shard it. Run the result for 1 block. Check the balance, across all shards, in gather. If the money got there the sharding was efficient and the problem is satisfiable.

Yes this version is still solving some variant of the trade ordering problem. But look at the graph of the reduction. Our sharding coordinator has a complete dependency graph for all trades and the ability to complete cross-shard setup instantly. Rather than posing an unrealistically difficult version of the sharding problem we are making it as easy as possible.

Ok maybe you want to see something totally independent. Assigning those trades, with a dependency graph, looks a lot like a classic job scheduling problem. If we look on Karp’s list we find job scheduling. And then we find several famous papers for what are pretty obviously the same problem. And don’t forget the somewhat-similar result we linked last time. Reducing any of the myriad variants of job scheduling that is known NP-Complete to efficient sharding is left as an exercise for the reader.

Sharding Is Not Useless

Just because sharding is hard to do in the worst-case does not mean it is useless in practice. The point of this post is not that sharding is useless. Sharding certainly helps sometimes. It might even help “on average.” But this is a hard problem. This leaves us with two choices:

1. Scalable solutions which are prone to accidents
2. Truly reliable scalability but P=NP etc

What do I mean by accidents? Systems that fall over when they are overloaded. Whether that is exploding block times or proper crashing or whatever else is a software engineering question rather than a math one. But something bad. Mitigation is a requirement if you want a robust system because you can’t engineer around this challenge and still have the cryptography.

My name is XYZ-Chain and I have a scaling problem

One of my favorite observations about folk psychology is that acceptance is the first of the 12 steps and the last of the 5 steps. You can’t get better, or move on, until you admit you have a problem.

There are many defenses available. Fee auctions are an obvious one. Forced throttling that you acknowledge upfront is another. This is the actual challenge of building scalable blockchains: finding reasonable mitigation strategies in the face of hard limits on throughput.

Scalable systems with always-super-low fees are either a) prone to liveness problems or b) proofs that P=NP. And this has nothing to do with CAP or FLP or PACELC. It is still true with a perfectly reliable, synchronous network.

Adding More Features

You cannot get around this problem by adding some new feature. This setup does not require any proof system, zero-knowledge or otherwise. It does not assume any form of roll-up. It does not journal anything. There is no proof-of-anything.

There are a bunch of shards and some nearly-omnipotent coordinator with a single limitation: the computer used for “air traffic control” is a deterministic Turing machine. Trades have a no-negative-balance requirement. And we insist our job-dispatch process takes time at most polynomial in the number of trades and shards. That’s the entire setup.

Lastly, note that assuming the users organize themselves optimally is not a solution. A threat model where exponentially many anonymous users coordinate to solve NP-Complete problems is already insecure. It’s a thing, sort of, but not what you think.