A story of four Cosmos testnets
Following the untimely demise of gaia-9002, the Cosmos validator community was left with no public testnet right before the Game of Stakes launch on Tuesday. Consensus was quickly established that this was unfortunate, and that a new testnet was needed, preferably launched by the community itself. Zaki Manian, Tendermint’s resident wizard and master of testnets, quickly confirmed his support on the Validator group on Riot:
Decentralized, after all
The Game of Stakes will launch in a decentralized fashion, meaning that all participants will submit a so-called genesis transaction, delegating stake from their accounts to their validators. These genesis transactions will comprise the first block, allowing the network to reach consensus.
A centralized launch would instead see a small number of Genesis validators — ran by Tendermint, holding a majority stake — jump start the network. Validators would then delegate tokens to their validators with the network up and running, and the Genesis validators would unbond as soon as the network is ready to establish consensus on their own. However, this was tested in gaia-9002 and turned out to be impractical — fee distribution is asynchronous, so the Genesis validator’s (massive) rewards were paid out in their absence.
Our auto-redelegation tooling turned out to be superior to everyone else’s, and in an amusing turn of events, we ended up collecting most of the rewards, netting us a few peta-atoms and a 2/3 majority:
Now, we had a lot of fun — doing censorship attacks and eventually halting the network due to a Tendermint deadlock, which wasn’t exactly designed for a single node holding a majority — but this wouldn’t fare well in GoS.
When it became clear that the fix will require a significant refactoring of the fee distribution logic, Zaki concluded that the Game of Stakes would launch in a decentralized fashion, without Genesis validators.
Decentralized launches are much harder than centralized ones — they require a lot of discipline in first getting the gentx procedure right (it’s easy to generate invalid transactions), and then get 66% of the validators to actually show up at launch time. In the past, decentralized launches repeatedly failed to gain consensus initially or lost consensus later on. Participants ended up delegating their tokens to a small set of core validators able to maintain high uptime, none of which was going to be an option in GoS.
This will make the GoS launch even more challenging and exciting — not only will it be the largest BFT network launch in history, but it’s also going to be decentralized, with everyone holding an equal stake!
It was clear that any community testnet in the lead-up to GoS should be as close to the real thing as possible, using the GoS network parameters, as well as being launched in a decentralized manner.
Two proposals emerged: one for an unofficial ad-hoc testnet called genki-1000, launched by a fearless subset of the validator community, willing to sacrifice both weekend and proper amounts of sleep, driven by Kwun Yeung of Forbole, as well as our proposal for a slightly more orderly launch of an official gaia-9003 testnet on a weekday, which includes the whole community.
With the impending GoS launch on Tuesday, it was clear that an immediate solution was needed in addition to a proper gaia testnet, so off we went, getting genki-1000 off the ground!
Forbole published a GitHub repository, stating the network’s objectives and deadlines, and started collecting genesis transactions via GitHub PRs.
Once we woke up, the process was migrated to our Genesis Collector tool, which we built in response to earlier difficulties in dealing with invalid genesis transactions. It guides validators though the process of generating a genesis transactions, checks them for validitity, and stores the results in a database. They can then easily be exported as a final genesis.json file, which is then published and downloaded by validators.
Genesis transaction collection started at about 2018–12–07 04:00 UTC, migrating to our collector at 10:00 UTC and concluding at 15:00 UTC. 28 validators were included in the genesis.json file.
The community managed to get to 50% consensus very quickly:
However, it plateaued there, waiting for the remaining 16% to join. At ~01:00 UTC, we finally managed to get past 66% — but while everyone pre-voted, we never made it past the pre-commit stage! As it turned out, three validators kept voting nil, preventing the network from committing.
The community quickly figured out that those had unreasonably low block timeouts of a few nanoseconds — the result of a configuration format change a few cosmos-sdk releases earlier — , preventing them from seeing other validator’s pre-votes and timing out. The config change was quickly fixed, and the network took off — or so we thought…
At this point, the network had reached round ~240, repeatedly voting on the same block, failing to establish a polka round after round. Each validator who rejoins the network after a restart needs to step through/replay all these rounds, one by one, progressing at original speed. With such a large number of rounds, it was going to take a long time for everyone to catch up — genki-1000 was effectively dead. This was determined to be a known bug.
With both us and Forbole asleep, the night shift then went right on to launch jenki-1000 with about six validators, at ~13:00 UTC, led by Adrian Brink of Cyptium Labs, which gained consensus at ~14:30 UTC. It then died to a consensus failure a few blocks in, caused by the same bug that killed gaia-9002 and which has been fixed in v28.0-rc1.
Still thirsting for a running testnet, the community then waited for v0.28.0-rc1 to be released, and proceeded to launch genki-2000, coordinated by us.
In order to speed up the procedure, we decided to shorten the genesis procedure to 30 minutes, with everyone staying online for launch.
Registration was closed at 17:30 UTC and genesis published at 17:35 UTC. Consensus was established at 17:50 UTC, in what was probably the fastest testnet launch to date, despite GitHub’s cache serving an outdated genesis.json file (renaming it helped).
Forbole’s Big Dipper block explorer was up and running a few minutes in, and everyone was glued to their screens watching as the block height grew. At 18:40 UTC, 100% consensus was reached!
It also appears that the new testnet uncovered at least one new bug, as reported by Hendrik:
In genki-2000 running cosmos-sdk v0.28.0-rc1 validators have been randomly unbonded for a single block and readded in…github.com
On to gaia-9003
With genki-2000 up and running, we’re preparing for a community-ran gaia-9003. By now, the process is well-established, thanks to an amazing and helpful community of Cosmos validators.
The exact timing will depend on the GoS launch, which in turn depends on whether or not any further GoS blockers will turn up. As usually, planning and discussions related to gaia-9003 will take place in the Cosmos forums.
Meanwhile, any validator who wasn’t present at genki-2000 genesis can create an account and ask for tokens in the Validator Group on Riot. GoS genesis accounts are valid on genki-2000, too. Instructions:
genki-2000 testnet. Contribute to certusone/genki-2000 development by creating an account on GitHub.github.com
- Both well-organized official testnets and impromptu community efforts have their place, especially with the GoS deadline quickly approaching. However, everyone should keep in mind that people need to sleep, live in different time zones, and don’t want to read the Riot chat 24/7 or have to scroll though a 1000+ line backlog when they wake up, lest they be left out. Communication of intent is key, as is the ability to join later.
- A medium-sized testnet launch is only viable if most genesis participants are also present at launch. This implies a very short window of time between genesis transaction submission and launch.
- It’s crucial that a fixed launch time is announced before genesis transactions are submitted. Every validator who submits a transaction should confirm that they will be able to attend the launch.
- A decentralized start is likely to take at least 12–24 hours due to time zone differences, especially if genesis transaction were collected over a longer period. This is entirely expected and not a bad thing — patience is key.
- Time zones mean that launch timing is important. A quick analysis on the chat history of the Validators channel in Riot suggests that the best time to launch a network is either 13:00–16:00 UTC or 00:00–03:00 UTC:
Want to be notified about new content and recent developments? Sign up for our mailing list, and subscribe to our Medium and Twitter accounts. If you are interested in staking with us, write us at firstname.lastname@example.org.