Upgrading Cosmos SDK’s Release Process

A Behind-the-Scenes Account of Stargate and v0.43 Release Process

Cory Levinson
Regen Network
6 min readOct 7, 2021

--

It’s been almost a year and a half since the Regen Network engineering team stepped in as lead maintainers of the Cosmos SDK! With two large feature releases behind us, we thought it was about time to give some updates from behind the scenes and share how we’ve upgraded our software development and release processes over the last 18 months for the SDK.

After all, shouldn’t open-source development go hand-in-hand with open-source processes?

A Bit of History — Stargate Release

It was October 2020 when we cut our first release candidate of Stargate (v0.40.0-rc0). After nearly a year of working on the protocol buffer migration and coordinating with the IBC development team, our teams were ready to begin the process of testing and validating that v0.40 was ready for production.

The IBC module had undergone a professional audit by the Informal team, and after thorough rounds of both architecture and implementation review by our team and other core SDK contributors, we were prepared to publish a release candidate (RC) and begin a more rigorous process for testing. It took 3 months and 8 release candidates of heavy testing from the SDK development teams and the Cosmos Hub validator community before it was actually ready for a final published release.

While the SDK’s release of Stargate — and subsequent upgrade of the Cosmos Hub — was a great success for the performance improvements and general stability it brought to Cosmos chains, we did have a handful of bug-fixes and security vulnerabilities over the next weeks and months that required API or consensus breaking changes to the SDK. This is something we definitely wished we could have avoided.

More Features, More Audits!

As we moved into spring 2021, our engineering team was heads down, wrapping up the implementation of the long-awaited feegrant and authz modules. These two features had been designed prior to Stargate, but due to bandwidth constraints, we decided to push their implementation back until after we released v0.40. This way, we could build these two new modules directly on top of the SDK’s more modern protobuf based architecture.

As we approached feature completeness for our next big release of the SDK, we decided to up-level our release process with a goal of reducing necessary patch and security releases later down the road. The hope was that with a more rigorous, formalized, and front-loaded internal audit process, we may drag out the time-to-RC. Ultimately, however, we would raise the bar for the quality of the end release significantly and reducing the time spent in the RC phase.

We did this by introducing two new audit processes:

  • A formalized ‘Module Readiness Checklist’, which each new module of the SDK must go through before being tagged in a published release
  • A ‘Release QA Checklist’ to ensure thorough testing and quality of all changes made since the last major release (e.g. Stargate in the case of the v0.43 release)

Module Readiness Checklists

For module readiness checklists, we break down the audit into several distinct components, and assign individual team members to each:

  • API Audit — Ensuring Msg & Query methods and types are well named, organized, and well documented
  • State Machine Audit — Reading through all state machine code, ensuring that implementation matches specification, checking for state machine edge cases, and assessing potential security risks or spam attack vectors
  • Completeness Audit — Ensuring genesis import & export, query services, CLI methods, and migrations are fully implemented to completion

Each of these audits results in a write-up and subsequent set of follow-up PRs to address any needed changes. The readiness checklists themselves are tracked as Github issues (we even have an issue template), and PRs are tracked and referenced against this issue.

Release Process & QA

While a good amount of feature development in the SDK happens within the context of SDK modules, it’s not all encompassing! Over the past few months, we’ve spun up working groups and epics for the storage layer, transaction improvements, and simplified app wiring. However, none of these fit this new module auditing model for “module readiness checklists”.

For this reason, we’ve created a parallel structured process for more general Release QA. This process outlines the major events in a release process and sets explicit checklists that need to be completed along the way.

Feature Completeness

Upon feature completeness for a given release, we kick off module readiness checklists for each new module that’s been developed. We do a similar existing module audit for any changes to the existing SDK modules since the last major release (see our v0.43 existing code changes audit as an example).

Tagging the Beta

Once the above audits above have been completed, we’re ready to tag a beta1 off our master branch. Between any beta and rc releases, we freeze the master branch from any kind of state-breaking changes. Client-breaking changes and small API-breaking changes can still be merged into master.

Off of this beta tag, we kick-off a process of heavy manual testing and simulations. Included in this are running cloud simulations, testing all features, and fixes in the change log, also testing any migrations of existing state to new state through on-chain upgrades of testnests. We also try to get a few projects in the ecosystem to test out an upgrade to the beta on their blockchain with their dev team or validators.

One key note here: Multiple beta releases can be tagged (always off master) if needed. What we did in Stargate with several subsequent RC releases should be done in this beta phase.

Tagging an RC and Final Release

Once there’s sufficient confidence in the beta release, we branch off of master into a release branch (e.g. release/v0.43.x) and tag a release candidate off the HEAD of that branch. Updates from here should be focused on bug fixes and final polishes. PRs are merged into master and must be also backported into the release branch if expected to go into the release. Client-breaking changes and small API-breaking changes can still be merged into the release if deemed necessary.

As a final step, we audit the changelog fully against the commit log, ensuring all breaking changes, bug fixes, and improvements are properly documented. We recommend running a final round of devnets & testnets for manual testing, focusing primarily on any changes or bug fixes introduced since the last beta tag.

Multiple RC releases can be tagged if necessary. When sufficient confidence is reached in the stability of a release candidate, we then consolidate the changelogs from the RC & beta tags, synthesize high-level notes into a RELEASE_NOTES.md file, and publish the release!

So did it work?

For the v0.43 release, the introduction of this process dropped the number of RC’s by 50% (from 8 RC’s for Stargate, to 4 RC’s for v0.43) — a pretty big improvement!

Another metric we’ve tracked is the number of SDK releases with security or other critical bug fixes we’ve had to make after a given feature release. In the 2 months following the release of v0.43, we’ve had to make two security or critical patch releases, one of which was state machine breaking. This is a huge improvement over Stargate, wherein the 2 months after its release we had 8 patch releases, 6 of which contained security or critical bug fixes, and 2 were state-machine or consensus breaking.

Looking to the future, we’re excited to continue learning and iterating on this process, both for the SDK and within the context of our own blockchain, Regen Ledger. In fact, we’ve just entered the RC phase of the release process for Regen Ledger v2.0 and are looking forward to a smooth and stable on-chain upgrade of our own Regen Network soon!

P.S. We’re Hiring!

Interested in joining our ranks to work on one of the largest proof-of-stake frameworks in the blockchain ecosystem? Or maybe you’ve been working on blockchain projects for a while and are ready to turn your attention to a project that’s directly trying to tackle issues around climate change?

We’ve got several roles open internally at Regen Network for both Cosmos SDK and Regen Ledger and are always looking for great developers to join our team. See our available roles here!

--

--