First lessons on double-signing in a sharded architecture — Testnet v1016

Lucian Mincu
Elrond Network
Published in
3 min readSep 19, 2019

--

Elrond’s launch is approaching fast. Even though the mainnet isn’t live yet, you can start developing smart-contracts today and deploy them on elrond very soon. Elrond aims to be the best on-ramp for building dApps ready for the broadband era of the internet.

On our journey toward creating the most advanced state sharded proof of stake architecture designed to scale significantly compared to previous blockchain iterations, today we stop briefly to discuss a simple double sign attack which happened on our latest testnet release.

Let’s jump straight into the problem:

The Nothing at Stake problem

With generating a vote being free with no intensive computer work necessary, what’s to stop a nefarious staker from casting a double vote? One for the block upon which the chain continues to build..and one on the orphaned or “forked” chain as well? In fact, since votes are free, why not generate many votes every block, to potentially multiply block rewards…effectively double spending on the network? …

Oh wait, something like this happened in one of the first versions of Cosmos GoS but given they had only the Cosmos-Hub back then and no other zone was interconnected they resolved it pretty fast. The problem would have been more difficult to solve at a different phase of the network development.

Here’s what happened in our case with a multi-sharded architecture:

Shortly after deploying testnet v1016 our blockchain faced two “new” problems which led to some temporary process stop in shard 4 and complete desynchronization between shards and metachain.

First problem occurred at round 520 and nonce 451, in shard 4, when first official double sign attack managed to halt the entire shard.

Actually this round produced two blocks by the same proposer from two different instances. From the validators point of view, both blocks were valid, with the only difference being the aggregate signature stored in each of these blocks.

Given they had different hashes, some validators accepted one of them, while others accepted the other. This action split the shard in two groups of validators, and from this point no blocks were produced as the shard could not achieve necessary majority for the consensus mechanism to produce the next block.

The problem was solved with a hot-fix which has been deployed, after which shard 4 finally resumed normal process.

The second problem appeared in metachain at round 13903, when a proposer broadcasted a new block with nonce 12352. Because of latency, this block has was not propagated in the normal timeframe, which led in next round to a new proposer creating and broadcasting again a block with nonce 12352 but with round 13904. In round 13905 a new consensus group was formed and because the second block was created and broadcasted with nonce 12352 and round 13904. So they constructed on this one and proposer broadcasted the new block with nonce 12353 in this round. Shortly after its broadcast, at the beginning of the round 13906, the first block created with nonce 12352 and round 13903 appeared.

Because of the K finality which is set to 1, all the metachain nodes rolled back to this block and from this point they started to construct again. The main problem was that all the shards received the first broadcasted meta block with nonce 12353, and because of this, the meta block with nonce 12352 created later in round 13904 and broadcasted, has been set as final and included in their next shard block. Since the metachain accepted and constructed on the other block, created in round 13903 to which they all have rolled back, all the synchronization between shards and meta was stopped.

The fix for this problem is almost done and the patch will already be under internal testing by the time this is posted.

Moving on, we will open registration again for a new testnet version (1017).

Nodes that are already running on the network are automatically registered for this new version.

In preparation for the new testnet release we’ll be keeping the discussion open on the elrond validators Riot channel: #elrond:matrix.org

See you on the other side.

--

--