A recap of the Validators’ Emergency Upgrade Retro

Cosmos Hub
The Interchain Foundation
4 min readJun 22, 2023

--

A recap of the Validators’ Emergency Upgrade Retro

This post is a follow-up to the first validator retro call held on May 17th, 2023. The call was a retrospective analysis of an emergency upgrade on the Cosmos Hub network for Gaia v9.1.0 (performed on May 8th 2023). The goal of the call was to analyze operational processes in order to improve them, and thus improve the validator experience on the network.

Resulting improvements from the call will span documentation, technical and process improvements. It is our hope that we will continue to collectively build on these improvements and tighten feedback loops across the network, ensuring a secure and efficiently run Cosmos Hub.

Hub’s Validators are the best!

Before we begin, we want to add a quick note of gratitude for the Cosmos Hub validator set — the nodes responsible for maintaining the liveness of the Cosmos Hub blockchain.

With the launch of the ATOM Economic Zone following the v9 upgrade, operations on the network have become more complex — and the validator set has met these challenges head on and exceeded all expectations to support the network and its varied needs: from participating in the governance process to running consumer chain infrastructure.

The most recent chain upgrade on June 21st — v10 — was successfully completed in no more than 5 minutes — marking the fastest upgrade in the history of the network (for context, v9.1.0 upgrade was roughly 6mins, the best at the time).

We want to take a moment to thank all of the validators and contributors involved in this process — it is because of you, that the ATOM Economic Zone is blossoming and that the networks within it are stable, censorship-resistant, and impactful.

Context about the v9.1.0 upgrade

The Cosmos Hub protocol relies on its validators to operate the network. In cases of emergencies, coordination challenges can arise because of the decentralized contributor base and the globally distributed infrastructure. On May 4th, the decision was made to schedule an emergency upgrade, and on May 8th, the upgrade was conducted.

The emergency upgrade was required to address an issue that prevented some validators from being able to make key assignment transactions. This edge case involved Amino format keys, a data format that has been largely removed from Cosmos, but is still used in Ledger devices and multisigs. The fix involved a small change in format of key assignment transactions. If left unresolved, a significant number of validators would have been slashed for seemingly not running consumer chain nodes. [Please Note that Cosmos SDK v0.50 (Eden) introduces SIGN_MODE_TEXTUAL , an alternative that will replace Amino Signing in the future]

For more context about the emergency upgrade, please see this blog.

On May 17th, we gathered a group of validators together for a public retrospective on the prior week’s events. You can watch the full recording of that call here.

Key Takeaways

Feedback from the retro broadly fell into three categories, with examples being attached for each one:

Documentation

  • Update key assignment documentation for more visibility into timelines
  • Improve upgrade flow clarity
  • Provide better on-boarding to key comms channels

Technical

  • Explore the possibility of having expedited proposals

Process

  • Update the structure of the mailing list to contain 2 email addresses. One being the regular address, for non emergencies, and another for emergencies
  • Aim for 48 hour lead time for sharing the binary so that validators have an opportunity to review

Improvements

A group of contributors has come together to improve each of the above areas.

Regarding documentation: Informal Systems is currently leading a round of validator interviews and is actively in the process of sourcing improvements for the docs. Additionally, the Informal Systems team will coordinate with other key stakeholders to provide more clarity into the emergency upgrade process.

Regarding the technical improvements: contributors are doing research to make improvements to the x/gov module and enable not only more expressiveness in voting, but also the technical capacity for expedited upgrades.

Regarding the process improvements: when possible, upgrades will include a lead time to review the source code. Additionally, if you’re a validator, and haven’t already done so — please fill out this form to input your email address and emergency contact info.

As we continue into 2023, we’ll also hold additional feedback sessions for Hub stakeholders and validators — keep an eye out for them — your participation is greatly appreciated.

Conclusion

Emergency upgrades are never ideal for large decentralized networks. This will not be the last emergency upgrade for the Cosmos Hub, but we’re optimistic that future upgrades will be more straightforward and boring — and we look forward to working together to ensure that is the case.

About the Authors

Abra Tusz works on governance strategy at the Interchain Foundation.

Denise is a software engineer at Hypha Co-op who currently works on the Cosmos Hub Testnets Program

Milan Mulji is Technical Relations Lead for the Cosmos Hub at Informal Systems.

Lexa Michaelides runs validator relations and testnet coordination at Hypha Co-op.

--

--

Cosmos Hub
The Interchain Foundation

Home of ATOM, Interchain Security & builders of Interchain Stack. Serving as the economic hub & service provider to chains in the Interchain. www.cosmos.network