DiffBomb Day Postmortem

Alephium
5 min readDec 21, 2022

--

TL;DR

In the afternoon (CET) of December 8th, the Alephium community reported that the hashrate displayed to network participants had started increasing significantly.

This increase in hashrate was caused by a very steep increase in difficulty, which had the effect of seriously slowing down block production.

The root cause was soon identified, a patch was issued and deployed by full-node runners, including mining pools and others across the community, and the whole chain was back to a stable functioning in less than 36h.

What happened?

First symptoms — At 3:27PM on December 8th, MontaiL was the first to signal on Alephium’s discord that there seemed to be an anomaly with the network hashrate. Over the next few hours, the community could observe a very significant linear increase in hashrate, whether it was through mining pools dashboards or other services.

Cascading effects — The hashrate that is observable from the full node is an estimation computed from block difficulties during a period of time. During the DiffBomb day, what people could see was in fact a trompe-l’oeil caused by the increase in difficulty.

By the end of the day, it resulted in a significant delay between blocks as you can see in the following screenshot of the explorer taken at 00:20 CET.

Identifying the cause — By that time, the team was entertaining several hypotheses that could potentially explain the observed effects (ASICs testing, a DOS attack, others…).

During the investigations, the team made sure to keep the community informed of its progress (1, 2, 3).

A bit before 4am CET, Cheng Wang shared in the Discord that the issue had been identified as well as a remediation plan drafted.

The difficulty bomb was triggered on December 8th, 2022, exactly 1 year and 1 month after mainnet launch (which happened at 3:54PM CET on November 8th, 2021).

The Difficulty Bomb is a mechanism designed to ensure coordination on a protocol upgrade at least once every 13 months in the case of Alephium. It was configured to be automatically pushed every time an upgrade happens. The Leman Upgrade, coming in early 2023, got more ambitious than anticipated and therefore was not completed in the adjustment window.

The DiffBomb was implemented in the early days of the Alephium network. It was not documented or communicated appropriately. Due to a recent hardware upgrade (and Murphy’s law), some of us didn’t have the usual full node running at home, hindering key team members ability to run comprehensive analytics. Regrettably, in the middle of this stressful situation, it all resulted in a delay identifying the DiffBomb as the root cause for the observed behavior.

The remediation

As soon as the issue was identified, the Alephium dev team started to work on a patch to upgrade the full node code with two objectives:

  • Shifting back to the difficulty that was prevalent before the DiffBomb triggering,
  • Removing the DiffBomb.

At 10:27AM CET, a publication informed the community of what was coming, and at 2:40PM CET a detailed next steps roadmap was released.

After intensive testing & code review, the patch (including an activation timestamp setup for 8PM CET) was released and announced a bit after 5PM CET.

At 7PM, most mining pools and services had updated their full nodes, and the Alephium team released an excel sheet to help the community stay informed of the state of the upgrade.

At 8PM CET, activation had gone smoothly, the network had upgraded successfully, difficulty was back to normal, hashrate had normalized, and block production was going as expected.

Lessons learned

After a year of a fairly smooth ride, Alephium met its first big test with the DiffBomb Day experience. A few lessons were learned in the process:

Run more full nodes: Ensure that more people in the dev team have a full node running at all times, this will decrease reaction time and improve immediate data analysis.

Expose the difficulty metric in the full node: To bring more transparency and ease future analysis, we want to make it easier to see the difficulty at any point in time, directly from the full node.

Document more: It was always the plan to document the technology and code more, and the team has started a significant effort in that regard. Alephium team will up the ante here, and provide more documentation.

Communicate more: The constant communication has been useful to minimize disruption to all parties involved. The community support, questioning, presence and stimulating challenges have been humbling and amazingly helpful.

In conclusion

The DiffBomb day has been Alephium’s most serious test so far. With the combined efforts of the entire team and the decisive support of the community, we were able to quickly and smoothly resolve the issue.

The community has been a tremendous help and stimulation the whole time. Once the patch was ready, it’s been deployed quickly by full nodes runners, private individuals, miners, pools or services providers.

Alephium can’t be more grateful to have such a challenging, informed, fun and passionate community.

This pushes us to do better, build better, always.

--

--

Alephium

Scalable for devs. Secure for users. Decentralized for all.