The root cause of the Kava mainnet launch failure was the team’s decision to publish an updated genesis file roughly 6 hours prior to mainnet launch. By not giving all validators enough time to become aware of the update and restart their nodes, this decision greatly increased the chance that some validators who had already started nodes with the previous genesis file would join the network at launch. As it happened, once 70% of voting power came online, the first block was signed by that quorum of validators. However, the presence of incompatible genesis files within that quorum resulted in a deadlock on block 2, as neither quorum had enough power to make an additional block.
Going forward, the Kava team commits to postposing network launches and upgrades when a genesis file update is needed less than 24 hours before the scheduled launch. 24 hours was selected as it ensures that the global network of validators will all have at least one period of business hours to see the update and respond. Further, the sharing of seed and peer nodes usually has not begun 24 hours prior to launch, so it is unlikely that most node operators will have already started their nodes.
Contributing factors to the launch failure were:
- A relatively small number of validators were online and able to sign the first block at the time of genesis
- A network launch using
tenderminthad never resulted in a block 2 deadlock
If a larger percentage of validators had been online, the group using the updated genesis file would have made progress immediately. Since a deadlock occurred, and validators had not experienced these conditions at a network launch previously, there were additional questions about if a previously unknown bug was causing the issue. After testing many scenarios, we are confident that
tendermint worked as expected under these conditions. One potential mitigation to make launches go smoother is the addition of a
genesis-hash flag to the application process, as suggested by B-Harvest. While validators still need to coordinate on what the canonical genesis hash should be, we support this update to tendermint because it speeds the debugging process and makes genesis validation an explicit part of starting the node, rather than a separate operation.