Breaking Down the Cosmos Game of Stakes
Decentralized proof-of-stake-based BFT network with adversarial conditions
- Game of Stakes is running a fully distributed testnet to subject Tendermint software to possible mainnet issues.
- The chain halted four times and a variety of issues were addressed and fixed, including critical Tendermint Core and Cosmos SDK bugs.
- A cartel formed (cynically, was planted) and was forked out, as Certus One coordinated an off-chain community fork to remove the validator cartel from version 5 of the GoS network.
- Cosmos mainnet is expected to launch in late February / early March.
- Staked will support Cosmos at mainnet launch and is available for delegation.
Cosmos is planning to launch the first public proof-of-stake (PoS) blockchain based on a byzantine fault tolerant (BFT) consensus mechanism (Tendermint). The Cosmos Game of Stakes (GoS) is the first implementation of a decentralized BFT-network with ~ 195 globally distributed validators, incentivized adversarial conditions and slashing penalties.
State of Play
Rewards (block rewards and transactions fees) in Cosmos are distributed automatically at the protocol level. However, validators are unable to auto-bond these rewards, requiring separate transactions to both withdraw and then re-delegate them. Due to the manual re-bonding requirements and incentive dynamics of GoS, the validators who were first to develop a withdrawal and re-delegation process were able to establish an early lead in accumulating stake and voting power. And while scripts can be used to automate the re-bonding process in GoS, they introduce potential security risks for mainnet operation. According to the Cosmos team, the new F1 fee distribution proposal addresses a variety of fee distribution improvements for mainnet operation.
As of late January 2019, the GoS network had reached a state of limited functionality where the prevailing strategy was for validators to enter a ‘bunker mode’ in which they censored all transactions other than their own withdrawal and reward delegations when proposing a block. This resulted in blocks being limited to including only the proposing validators withdrawal and reward delegation transactions. The bunker mode uses the min_fees function to employ what is effectively a defensive fee firewall that prevents other validators from waging mempool attacks and knocking proposers offline. However, even with bunker mode and effective fee firewalls, validators were having trouble executing reward transactions because of mempool spam transaction attacks that delay a proposer’s block creation time. The game had reached a state of play where there was no incentive to accept transactions from other network participants. As a result, it has been difficult to determine validator and network performance under operating conditions and transaction loads expected for mainnet. Optimistically, these types of transaction blocking attacks are not expected to be viable on the actual Cosmos mainnet.
Prior to the planned launch of GoS v4 in late January, GoS validator Certus One determined the initial validator registration process had been successfully sybil attacked after conducting an analysis of the original genesis transactions used to launch GoS v1 in December. 73 of the 180 genesis transactions, representing 40% of the initial validator set and 53% of the voting power as of GoS v3, were traced back to a single organization. In a byzantine fault tolerant PoS network, 33%+ control is the required threshold for chain halting censorship attacks. Shortly after the Certus One findings were published, GoS validator Bitfish publicly confirmed it had gamed the KYC process by having friends and family register accounts. According to Bitfish, their goal was to control 25% of the validator slots; exceeding 33% control was an inadvertent mistake. Since 33% cartels can successfully veto on-chain governance proposals, the validator community coordinated an off-chain fork to remove the cartel by submitting a pull request with a new genesis file for GoS v4. The final version of GoS is expected to launch on February 11th and run for 1 week before the GoS winners are determined.
Consensus Failures / Chain Halts
- Pre-GoS Launch Liveness Failure: Github. On 12/11/18, the Genki 3000 testnet experienced a liveness failure after a validator proposed an invalid block because they were running an incorrect version of the Cosmos SDK. The invalid block caused all the validators to freeze and prevented the next round from starting.
- Fee Distribution Logic + Block Size Limit: Github. On 12/19/18, at block height 11443, the GoS chain was halted due to a consensus failure by a bug in the un-bonding and fee distribution logic of the Cosmos SDK. Additionally, the chain was experiencing performance issues because of the block size limit. The initial block size of 50kb was being consumed by the signatures of 199 validators, making it difficult for validators to claim and delegate rewards. As a result, the network was upgraded with an increase in the block size and relaunched.
- Double Sign / Export Logic: Github. On 12/21/18, the GoS chain v2.0 experienced a chain halt due to a state inconsistency after a validator double signed a block and the slashing periods had been deleted in the export logic. At genesis, the chain is expected to create a slashing period for each validator. This failed to happen, causing the chain to halt when the double sign was committed and the slashing periods were missing. As a result, GoS was suspended for the holidays until restarting on 01/03/19.
- Pre-Vote Failure: On 01/04/19, the GoS chain v3.0 halted just prior to reaching 10k blocks after a subset of validators failed to pre-vote and a large percentage of the validators were knocked offline.
- Token Printing Bug: Github.
- Gas Bug. A bug in the Cosmos SDK resulted in transactions consuming excessive amounts of gas and failing. The bug was limiting the network to ~ 7 withdrawal + delegation transactions per block and required a software upgrade to fix.
- Simulation Code: On 01/22/19, the Cosmos team discovered a bug in the simulation code that uncovered a variety of previously undetected errors in the software. It was determined that the simulation had not been functioning correctly for the past 18 days. As a result, new testnet releases were suspended until the bugs were resolved.
- Vesting Errors: A bug in the reward vesting logic caused transactions with fees to fail and required a consensus breaking update to fix.
Staked operates highly available and highly secure, institutional grade staking infrastructure for leading proof-of-stake (PoS) protocols. The Staked infrastructure is deployed in a multi-tier signing and listening cloud configuration that combines geographic diversity and redundancy across on-premise data centers and cloud providers. We use Google’s Kubernetes container orchestration to achieve near-infinite scale, self-healing and hardware decentralization.