Understanding Harmony’s Cuckoo Rule for Resharding
Harmony is a fully sharded chain. It shards transactions validation and network communication but also blockchain state. The sharded nature of our chain allows us to be scalable, parallel processing transactions across all our shards.
Though sharding for scalability has its benefits it introduces security challenges for a decentralized network. In a sharded decentralized network, nodes come, stay and may go as they please. This openness may be used by an adversary to formulate join-leave attacks: The adversary operates a large number of nodes, and makes each node systematically leave and re-join the network until the network assigns the node to the specific shard that the adversary wants to take over. Also, if shards remain static, adversary has greater chance of corrupting the shard over time.
Thus it is crucial to change node–shard membership over time. To this end, the Harmony network binds node–shard membership to a regular period of time called an epoch, at the end of which the network changes the voting share configuration of its shards.
However, as a fully-sharded chain, where every shard maintains its own state, such a reshuffle could incur a tremendous communication overhead. To illustrate this, suppose that, at the end of an epoch, every node is reassigned to a random shard for the next epoch, and all shards share the same probability to be chosen. If there are 100 shards, the chance of a node staying in the same shard is only 1%, so 99% of the nodes in the network are expected to move to another shard and download the shard state from the 1% of the nodes staying in the same shard.
Moreover, would such a reshuffle maintain equal shard sizes?
To counter these problems, Harmony employs the bounded cuckoo rule to do this efficiently. The name comes from the parasitic behavior of some species of cuckoo, which lays eggs in another bird’s nest, evicting an existing egg from the nest in order to make room for its own egg. Before we get into it, let’s quickly look at the evolution of cuckoo rule, first proposed by Awerbuch and Schneider
Awerbuch and Schneider, were then looking at building scalable and robust distributed hash table wanted to handle the presence of adversarial nodes.
- For a new node that wants to join the network, assign a random real number, say between 0 and 1 to a new node.
- If you imagine, numbers between 0 and 1 divided into n equal length spaces, then node lies in a unique space. The “n” stands for number of shards. Say it lies in the “s” th space from 0. Then node would belong to the “s” shard.
- Now, consider previous nodes that lie in the s shard. Assign each of them new random numbers. The random numbers fall in other unique spaces from 0 to 1 and get assigned corresponding new shards. (The nodes have been cuckoo-d!)
- The new intervals that the nodes fall into would correspond to their new shards.
Simply put, cuckoo assign a random number to a new node that wants to join the network, the position of that random number indicates the shard number for the node and then cuckoo (move) nodes that are “close” to that random number (“existing eggs”), to new random numbers which give them new shards.
Thus new nodes can join the network and cause minimum reconfiguration — change only a few nodes that are in the same space as the new node.
Awerbuch and Schneider proved (under some constraints), the resulting shards will be balanced and would each shard contain less than 1/3rd faulty nodes. This type of resharding is also robust to join-leave attack.
(There is a bit of mathematics and technical nomenclature we are avoiding here, but this simplification is good to give us the right intuition.)
Seems simple, secure and easy to implement!
However, a few years later, Sen and Freedman found an adversarial strategy that is able to thwart the cuckoo rule — the shards no longer remained secure.
They found that cuckoo rule may fail because of “bad luck”: adversarial nodes may repeatedly join to the same shard with non-negligible probability, thus increasing the proportion of faulty/adversarial nodes in a shard beyond a fraction for it to be secure.
They proposed the following adversarial strategy to make it happen: at the beginning of each round, the adversary would sort all shards by increasing fraction of faulty nodes, and should have a faulty node belonging to the shard with the lowest fraction attempt to rejoin the system. This is to keep the adversary’s existing advantage (corruption ratio). For example, let’s say there are 3 shards with equal size, and an adversary controls 5 corrupt nodes — 2 in the first and second shards, and 1 in the last. If the adversary could afford to let one of them leave/re-join, the most sensible thing would be to move the lone node in the last shard, rather than move a node out of the first or second shard where there are already two corrupt nodes. With the cuckoo rule it has non-negligible probability of succeeding.
Sen and Freedman’s approach was to partially modify the cuckoo rule by 1) ensuring the number of nodes cuckoo-d during a join deterministically matches the expected amount, and 2) allowing shards to reject join attempts if they have not received sufficient new nodes since the last join. Using this commensal cuckoo, their fault tolerance for fraction of faulty nodes is 35x larger than that of the cuckoo rule.
Commensal Cuckoo Rule
- Just like the cuckoo rule above a new node attempts to join a shard chosen randomly.
- Its joining attempt is rejected if that shard has not received sufficient number of cuckoo-ed nodes. If rejected, the node tries again.
- Once the node joins a shard, cuckoo k nodes chosen randomly from that shard, instead of all nodes in that shard earlier.
The Bounded Cuckoo rule that Harmony builds on is even simpler. Note that with Harmony we deal in voting shares and not nodes. The voting share for a node is proportional to the tokens they have staked. Hence cuckoo rule for Harmony is concerned with dividing the voting shares among all the shards.
Here is how we do it:
Bounded Cuckoo Rule
- The nodes who withdrew their stake before the beginning of the epoch are evicted from the network, while those who keep their stakes stay.
- The new nodes who staked during this epoch get new voting shares.
- These voting shares will be randomly assigned to the shards that have more than the median of the total voting shares.
- Next, a constant number of voting shares from all shards will be randomly re-distributed to the other half of the shards who have less than the median of total voting shares.
Our scheme above (based on work by Zamani and Mohavedi), moves only a constant fraction of shares between shards while provably guaranteeing security under reasonable constraints.
We are excited to tell you that the first version of our resharding is already ready — we are able to now shard new nodes joining into 4 shards (1 beaconchain + 3 regular shards). Take a look at our implementation on GitHub , and try it by downloading our wallet and playing with it on our cello testnet! Let us know if you have questions or want to know more on our forum!
Thanks to Eugene Kim for suggestions.