What happened with block 2731040 and how we fixed it
On May 1st, 2017, the Lisk network encountered a critical bug at block height 2731040, round 27040. Due to the severity of the bug and in order to properly explain the reasoning behind the two Lisk Core patch releases 0.8.1 and 0.8.2, we are providing a summary of events, and the measures we took to eventually fix it.
May 1st — 13:00 CEST
A discrepancy was identified between the memory tables and the blockchain calculated balances for 101 delegate accounts. These accounts were granted an additional round worth of fees and block rewards, due to a bug in the round processing. For those unfamiliar with what a round is: A round is the end of a forging cycle of 101 delegates. A round applies the forging rewards and the fees to the delegates who participated in the round. These rewards are applied to the Memory Tables. Memory Tables are the ongoing ledger of the account balances and other data associated with accounts in the database. Essentially, what happened as a result of this bug looks like this:
What happened:
1. Block applied → Round calculated and rewards allocated.
2. Forked block detected → Round failed to calculate and apply rollback → Block removed.
3. Correct block received → Round calculated and rewards reallocated.End result: Round 27040 was applied twice to the Memory Tables.
What should have happened:
1. Block applied → Round calculated and rewards allocated.
2. Forked block detected → Round recalculated → Rewards reverted → Block removed.
3. Correct block received → Round calculated and rewards reallocated.End result: Round 27040 was applied once to the Memory Tables.
Only 70% of the network was subject to this bug, meaning 70% of the network received the first block, and then were sent the second block that declared the first as a fork. The majority of forging delegates also received this block and therefore all exhibited the same response to the bug, applying the round twice.
This means that the remaining 30% of the network did not have the same balances as the rest of the network. Because this was a bug calculating round rollbacks, there was no possible way to rebuild the blockchain. The network had reached a point where vote calculations were now being generated off of the erroneous data.
The decision at this point was made to apply this round twice as a permanent exception to the blockchain, as there was no other way to work around the issue in a live environment. The end result was 505 LSK being created to account for the duplicated application of rewards, and 2.7 LSK split between all active delegates.
May 2nd — 04:00 CEST
The Core Team had been engaged since the start of the event, working feverishly to implement a solution to the issue that had arisen. The 0.8.1 patch release was prepared and ready after 15 hours of troubleshooting and testing. It took all of this time in order to ensure the exception application process for rounds was functional and properly tested.
May 2nd — 11:00 CEST
The root cause of the issue was identified by Isabella Dell, our System Architect. Work then began immediately to fix the failing code. Below is a screen shot of the bug being reproduced on an internal development network.
The difference between all of the systems on the network is made clear by this image.
A functional solution to the problem was found but it required extensive testing in order to ensure that no behavior was changed. Work on this was paused at 1:30am CEST on May 3rd to give the developers some time to rest.
May 3rd — 17:00 CEST
After a minimal amount of rest through the night, the Core team resumed work on the fix to the round issue. A fully tested release was produced by the end of the day with no visible bugs identified. The team allowed for a long stress test run over night on our Devnet, to check for any missed scenarios in the code.
May 4th — 11:00 CEST
Confident that the bug was fixed, we issued a new release to the Testnet, 0.8.2a. In order to reproduce the bug thoroughly, all of the genesis delegates were voted in. While the fix was 99% of the way there, there was a remaining fault in the rollback of fees related to the remainder. Debugging this issue required the rest of the day to perform. 7 Hours later, a second release candidate, 0.8.2b was packaged and released. This build ultimately passed all tests. Genesis delegates on Testnet were then removed from the 101 and the network was given back to the delegates who were securing it.
May 5th — 15:50 CEST
After four long days, the final release candidate with a full set of tests for the revised logic was packaged and tested internally. With all of the checks in place and a validation performed against the Mainnet, we released 0.8.2 to the Mainnet. This build included refactored round logic, improved testing for rounds and the exception processing logic from 0.8.1.
Conclusion
As a result of this bug, the Lisk Foundation has burned the 507.7 LSK generated, using it’s own funds. The transaction burning the LSK can be viewed here: 15957909420207830355. This transaction brings the total supply back to an accurate state.
The Lisk Core patch release 0.8.2 was successfully deployed, and to the very best of our abilities, ensures the bug never occurs on the Lisk network again.
A very big thank you to Isabella Dell (System Architect) and Mariusz Serek (Core Developer) for providing such an in-depth analysis of the problem, and for working extremely hard to resolve it in the quickest time possible.
About the Author
Oliver Beddows is Founder and Vice President of the Lisk Foundation, with over 15 years of experience in development he oversees all projects involving the Lisk project as CTO. He is a husband, father, amateur time-trialist, open-source advocate, and works tirelessly to build a better future for Lisk everyday.
Contact Details
Twitter: @Karmacrypto
Email: oliver@lisk.io
If you enjoyed reading this, please log in and click “Recommend” below. This will help to share the story with others.