Blockchain Tales: How Our Testnet Allowed Us to Catch a Critical Bug
On Tuesday, January 18, 2022, we performed some manual testing on a Substrate runtime upgrade on our test network (release 2.0.14, which had been pending for a few weeks), before deployment on the main network. After the deployment of the upgrade, we noticed that some of our pallets seemed to have been erased. More precisely, the impact was to the membership and collective pallets, which are typically used to control our upgrade procedure. This meant that we were no longer able to upgrade the test network to fix this bug on the fly and get it back in good shape.
We immediately halted the deployment of the runtime upgrade and started looking for the root cause of the issue. We were able to determine that certain storage migrations were missing from our runtime. Indeed, release 2.0.14 was designed to upgrade our chain from the outdated Substrate version 3 to a specific version used by Polkadot (referred to as “branch 0.9.12”).
To best respond to this incident, we split our team into two groups, with one group creating a pull request to integrate the missing migrations, and the other recreating a new test network, as our inability to upgrade the test network had essentially rendered it unusable.
At 8 p.m. PT that same day (4 a.m. GMT), we merged a pull request to create a new test network purposely choosing to recreate it on version 2.0.13 in order to still be able to test the fixes we would be merging soon thereafter. Our DevOps team then started deploying the new nodes with the updated chain specification and updated our public nodes used by interfaces such as Polkadot JS to the new network.
At 11 a.m. PT on Friday, January 21 (7pm GMT), we merged a final pull request that implemented the required fixes. After careful testing locally, we have begun deploying to our test network before we do so on the main network.
It is important to note that this affects only our test network and has absolutely no impact on our main network. Every individual user’s funds are perfectly safe and the Nodle Network as a whole was not impacted. However, this incident clearly demonstrates the usefulness of having such a test network. It also highlights how important it is that we quickly and efficiently respond to breaking changes within the upstream Parity Substrate repository, such as with the addition of new manual runtime upgrades.
Nodle is a decentralized IoT (Internet of Things) network on Polkadot providing secure, low-cost connectivity and data liquidity to connect billions of IoT devices worldwide. The Nodle network is powered by millions of Bluetooth-enabled smartphones that earn Nodle Cash (NODL). Nodle’s powerful IoT stack allows multiple uses including connecting and securing physical assets, tracking lost or valuable items, capturing sensor data, and authenticating security certificates. Nodle provides insights for consumer electronics manufacturers, enterprises, smart cities, the finance industry, and more. Since its creation in 2017, Nodle has become one of the world’s largest wireless networks by number of base stations. Join #TheCitizenNetwork by downloading the Nodle Cash app for iOS or Android.