Runtime Upgrades — An Experience Report
Runtime upgrades are an essential feature of Substrate; perhaps even the most important feature. But at the same time, they are a feature that is easy to avoid using if your project hasn’t launched the main network or a persistent test network yet, as it is tempting to simply reset the network and start with block #0 after each update or release. However, it is essential to gain experience with runtime upgrades. After all, it’s the only way to introduce features to a live and persistent Substrate-based blockchain!
At KILT, we published a new release a few weeks ago and at the same time we did the first runtime upgrade of our test network (Mashnet). In this article we want to share our experience and solutions to problems we encountered during our runtime upgrade.
The first thing we noticed was that it’s easier to do multiple smaller runtime upgrades than one big one, as a) it is easier to debug if only small parts of code change, and, b) Substrate removes some storage migrations after a while. Therefore it’s sometimes easier to update multiple times with smaller version increments rather than jumping to the newest version with one upgrade. Each of those smaller updates had their own challenges, which we will address individually here.
1. Introducing the Session Pallet
The first upgrade we did was adding the session pallet to our runtime. We started, like most projects do, from the Substrate node template. In the template the authorities for Aura and GRANDPA are hardcoded into the genesis block and cannot be changed. Also the session keys for Aura/GRANDPA are not associated with blockchain accounts, which e.g. can hold tokens. The session pallet makes it possible to associate an author with a block. Each author can set their session keys that should be used for the next sessions. This is the very foundation of Nominated Proof of Stake (NPoS) because it connects the validator node to a blockchain account, which is necessary to pay out rewards. It also manages who is in the validator set for a specific session.
Adding a pallet is straightforward. Simply add it to the construct_runtime macro. At least, that’s what we thought! But the session pallet requires a genesis configuration. Since we add the pallet later, this configuration will never get executed! The session pallet won’t work.
To fix this, we had to execute the genesis config during the runtime upgrade, which is, again, easier said than done. The storage of the session pallet is private and can only be accessed from within the session pallet. There is no way around this but to fork the session pallet. We decided for a quick and dirty copy & paste solution. We simply copied the session pallet into our project and made the storage public. Afterwards we could just copy the genesis configuration code and execute it during the runtime upgrade. Later, we removed our local copy of the pallet and replaced it with the official version. We just needed to make sure that both versions use the same pallet index.
DISCLAIMER: Don’t look too closely at the code! We used the current authorities to collect the session keys, which is not recommended. In a main network the runtime upgrade would probably introduce new session keys. The main problem with our approach is that the account and session keys have the same seed, which is also not recommended.
2. Updating Substrate
Our next struggle was with updating the Substrate dependencies. First we wanted to have a list of what exactly changed between the Substrate version we were using and the version that we wanted to update to. Since we were not on a released version like 2.0.0 but a specific commit, it wasn’t that easy to get a list of changes. Especially because we were mostly interested in how the storage changed. In the end, we found you only need to execute upgrade logic if the storage changes.
Ultimately, we found a list of storage migrations done in Substrate which helped us a lot.
After we updated the Substrate dependencies we noticed that this was actually the easiest migration, since Substrate executed the migration automatically. But be aware that migrations might get removed after a while. If you do a too big jump with your Substrate version you might not be as lucky.
3. Removing Pallets
Sometimes you need to remove a pallet. This sounds like an easy task and it’s tempting to just remove the module from the construct_runtime! macro. But there are several traps to fall into! First of all the order and index of modules in construct_runtime! are important. If you remove a pallet, all the following pallets will have a different index. So it’s a good idea to give all your pallets a fixed index. The second trap you could walk in to is to forget to clean up the storage that the pallet used. Removing all the storage of a pallet can be done in multiple ways. We did it using kill_prefix(two_x128(b”Module_prefix”).
Troubleshooting and Miscellaneous
You might stumble across multiple different problems while writing your runtime migrations. Here are two things we would have liked to remember before we tested our runtime migrations.
Polkadot JS Types
Remember that the custom types in the Polkadot apps are most likely outdated after you executed your runtime upgrades. Also make sure to not only adjust for your own updated types, but also for those changed by Substrate.
Update All of the Node
After our runtime upgrades went through we had problems submitting transactions to our node. At first we thought that we had made a mistake during the update, but after we started the node with the new executable those issues were fixed. The problem was that we not only needed to update the Runtime-WASM, but we also needed to update the rest of the client.