A Post Mortem Report: The Constantinople Ethereum Hard Fork Postponement

Motivation

The purpose of this document is to first cover the process Ethereum and its community went through during the hard fork postponement. Secondly, it is to address what could have been done to prevent it and to create actionable items and processes for the future.

Post-mortems should always be made after hard forks, regardless of it having emergency issues. Having a general document that describes the chain of events that occurred during the process is extremely valuable. Let this document set an example for this. Having it walkthrough and list the associated events in order to understand what combination of events created the scenario (successful or not), and how we can make it go smoother the next time. Overall, this document will help shape the future structure of hard fork release coordination, retrospective reports and help Ethereum and its community prepare for potential future emergencies.

Timeline

  1. 2017–02–13: EIP 145 was created.
  2. 2018–04–20: EIP 1014 was created.
  3. 2018–05–02: EIP 1052 was created.
  4. 2018–07–19: EIP 1234 was created.
  5. 2018–08–01: EIP 1283 was created.
  6. July 2018 — October 2018: EIP’s implemented across all clients.
  7. 2018–10–19: Hard fork delayed due to consensus issue on Ropsten after a test run of the fork.
  8. 2018–10–19: New date for Constantinople proposed: January 16, 2019.
  9. 2019–01–15: 3:09 am PT: ChainSecurity responsibly discloses potentially vulnerability via Ethereum Foundation’s bug bounty program.
  10. 2019–01–15: 7:26 am PT: Christian Reitwiessner raised the issue internally on the bug bounty-list.
  11. 2019–01–15: 7:32 am PT: Martin Holst Swende posted to the larger group at the Security Gitter channel.
  12. 2019–01–15: 8.04am PT: MHS posted to EthSecurity Telegram group, and gave the go-ahead to publish the article
  13. 2019–01–15: 8:04 am PT: Original article by ChainSecurity was published.
  14. 2019–01–15: 8:52 am PT: Martin Holst Swende posts on AllCoreDevs Gitter channel: “Please read: https://medium.com/chainsecurity/constantinople-enables-new-reentrancy-attack-ace4088297d9 @/all We need a quick decision on potential consequences and how to move forward. We have about 37 hours left until the fork happens”
  15. 2019–01–15: 8:52 am PT — 10:15 am PT: Discussion occurs across various channels regarding potential risks, on-chain analysis, and what steps need to be taken.
  16. 2019–01–15: 10:15 am PT — 12:40 pm PT: Discussion via Zoom audio call with key stakeholders. Discussion continued on Gitter and other channels such as Telegram.
  17. 2019–01–15: 12:08 pm PT: Decision made to delay Constantinople upgrade.
  18. 2019–01–15: 1:15 pm PT: Public blog post released across various channels and social media.
  19. 2019–01–15–2019–01–18: Research and discussions surrounding EIP-1283 and analysis of on-chain contracts that could be vulnerable continue.
  20. 2018–01–18: Decision to not include EIP-1283 was made, as well as the new date set for new Constantinople: February 27, 2019.

Summary

Discovery of problem

ChainSecurity discovered that one of the five EIPs that was to be implemented during the Constantinople network upgrade (Ethereum Improvement Proposal (EIP) 1283) contained a critical bug that enabled reentrancy attacks when using address.transfer(…) or address.send(…). Previously these functions were considered reentrancy-safe, but if the network upgraded this would no longer be true

Postponing Constantinople

After much discussion and analysis from some of the key stakeholders in Ethereum, the community determined that the best course of action was to delay the planned Constantinople fork that would have occurred at block 7,080,000 on January 16, 2019.

The Decision Making Process on for the Postponement on January 15th

  1. Understanding and conveying the threat to the community
  2. Identifying the size of the threat
  3. Major factors involved in the decision of postponing the fork
  4. Decision time-frame window for aborting network upgrades
  5. Security Testing analysis/scanning
  6. Proactive and Retroactive Actions for the possible postponement
  7. Getting every stakeholder notified in time
  8. The Decision
  9. Communicating the decisions to everyone

Discussions & Decisions Made After the Postponement

Ethereum stakeholders discussed the options proposed on ETH Magicians and proposals during the All Core Devs Meeting 53 on January 18th at 14:00 UTC.

Decisions made during the AllCoreDevs Meeting:

  • The first hard fork will be the original “Constantinople” which will include all planned EIPs.
  • The second fork will be to disable EIP-1283. This was decided because the full Constantinople upgrade including EIP-1283 is running on testnets. This way testnets can just do the second fork and then continue to operate.
  • Both forks will be triggered on the same block on the Ethereum mainnet (block 7.28 million) which should occur on February 27th.

Feedback / Action Items to be put in place (Not all are listed below)

  • Assemble a team of technical writers who can be made accountable to push out a write-up explain the potential threat for the Ethereum community.
  • Creating a method for all clients and node operators to manually delay the fork via command line so new releases aren’t necessary.
  • Create an emergency comms team that spans over multiple time zones responsible for maintaining the current list of contacts, so we can reach exchanges, mining pools, infrastructure, as quickly as possible.
  • Create and maintain a contact list for emergency security analysis
  • Create a rubric for risk evaluation that will help facilitate decisions
  • Incentivize more developer tools across the ecosystem.
  • Create a more formalized process for reviewing and analyzing EIP’s
  • Create a plan for monitoring the health and backing of various forks.

Lessons Learned

Having a clear document with details of the issue was helpful.

  • In this case, the extensive write-up by ChainSecurity made it very easy for folks to understand the potential threat and catch up quickly as they came online. In future cases, this may not be the case.
  • ACTION ITEM: To prepare for future cases, we should take a proactive approach and assemble a team of technical writers (much like the emergency comms group from this case) who can be made sure to push out a write-up explaining the potential threat for the Ethereum community.

Identifying the scope of the threat is sometimes impossible due to time constraints.

  • In this case, much of the early discussion surrounded determining if this is the only threat or if it opens up other potential vulnerabilities and whether the size of the threat can be fully verified given time constraints.
  • Determining the size of the threat allows for discussions around solutions or mitigations strategies.
  • If the scope cannot be determined due to time or other constraints, it makes further discussion hard and postponement the only viable option.

Having to prepare new releases takes time and effort from multiple parties.

  • One issue we encountered was that Geth and Parity weren’t able to get the releases out by the time of the public announcement. This lead to some confusion and broken links for approximately two hours after the blog was posted.
  • ACTION ITEM: Ensure all clients have a way for node operators to manually delay the fork via command line so new releases aren’t necessary.

There is no formalized process for notifying the public.

  • We could have gotten public-facing statements out sooner (although if the quality of statements weren’t as strong it could increase the risk of fear/confusion/panic/scams)
  • Coindesk was the first to release an article which resulted in some fear / FUD. Luckily, it could have been much worse.
  • Conversations occurred surrounding the potential release of a public statement that said, “We are investigating a potential security issue, keep watch for updates.” Ultimately it was decided not to release a statement in this form due to the potential fear and panic it could have caused. More information is always better. In future cases, there may be more time and a statement of this sort may be necessary.
  • Incorrect communication or lack of timely communication can result in confusion, FUD, possible contention and loss of funds/confidence for the uninformed.
  • This can increase the possibility of consensus issues resulting in double-spend attacks on smaller exchanges (most likely to affect exchanges already more at-risk due to lack of security).
  • ACTION ITEM: Prepare a template for a statement where information is limited but the public needs to be aware of ongoing investigations and discussions.
  • ACTION ITEM: Create an emergency comms team. Everyone on this “team” needs to explicitly agree to it. That team should be made up of people who are reliable and across multiple time zones, and they are responsible for maintaining contact with the list of contacts. They should not be limited to Ethereum Foundation people.

There is no formalized process for notifying exchanges, miners, and node operators.

  • Time was spent finding and collecting contacts across the ecosystem to notify.
  • The mass notification could only be done once there was an “official” article with details on what occurred and what people should do.
  • This time, we were lucky to have a team of people ready to reach out to exchanges and miners at any time. We made contact with 70–75% of hash rate currently (Sparkpool, Ethermine, DwarfPool, Nanopool, F2Pool) in time. Additionally, the group created a last-minute stakeholder communication Google Sheet that contains the contact information for a majority of major exchanges, mining pools etc…
  • ACTION ITEM: A Comms team will need to maintain the current list of contacts, in order to be confident that we can reach exchanges, mining pools, infrastructure, etc. (e.g. https://github.com/trailofbits/blockchain-security-contacts).

Having multiple parties contributing to discussions in a matter of hours (core devs, security researchers, comms people, infrastructure people, etc.) was key.

  • The quick communication was mostly facilitated by public chats across Skype, Telegram, and Gitter and spread via word-of-mouth from there.
  • ACTION ITEM: We should collectively encourage the use of these channels for key players in the ecosystem so they can stay up to date as discussions are being had.

Having a “source of truth” is necessary for situations like these.

  1. The EF blog has historically been used as the main source of truth for news outlets. In the future, the AllCoreDevs can get communications out through Ethereum Cat Herders (ECH) and can they can act as a key official voice. Otherwise, the client teams are also official sources of information, and as such the EF blog is a way for the Geth and Trinity teams to get info out. Other stakeholders should be encouraged to broadcast on their own outlets afterward.
  2. The blog post was clear and contained most of the necessary information. Didn’t cause widespread panic.
  3. Certain decisions (last minute, emergency, etc.) will benefit from coordinated outreach and communication from multiple sources (i.e. Constantinople delay). This helps stakeholders to determine the difference between real news and FUD. This outreach had a unified message but was copied across parties involved with the relaying. Drafted together, then posted individually. This gave a unified, distributed appearance of coordination and agreement.

There is no formalized risk analysis/security analysis process in place to help with decision making.

  1. ACTION ITEM: Create and maintain a contact list for emergency security analysis. It should include key security auditors and researchers across the ecosystem that can assist and help verify potential vulnerabilities when these situations occur.
  2. ACTION ITEM: Discuss an associated compensation plan for this work (retainer vs. one-time funding).
  3. ACTION ITEM: Create a rubric for risk evaluation will be made to help facilitate decisions. This rubric would include:
  • The time-frame to make a decision (strict time to follow)
  • The time-frame to make the calculations/time estimate for the right amount of time to get high-quality exploitability analysis. This will include the maximum length time we can afford to spend on an emergency security analysis. We must consider that even with the time spent on analysis, would it change the overall decision to postpone?
  • Precedence Determination
  • The scope of the threat
  • With confidence interval threshold of acceptability
  • With timeframe to achieve desired confidence interval if needing to be met
  • Mitigation options
  • Associated timeframes to achieve success, pros/cons, Long term effects
  • Coordination required to achieve success

4. ACTION ITEM: Create a plan for monitoring the health and backing of various forks.

The vulnerability could have been discovered sooner.

(Reference to the discussion: https://github.com/ethereum-cat-herders/hard-fork-checklist/issues/1)

  1. ChainSecurity utilizes Truffle/Ganache for security analysis however the first Constantinople-ready Ganache version came out only six days before the hard fork.
  2. ACTION ITEM: Incentivize more developer tools across the ecosystem.
  3. ACTION ITEM: Developer tool readiness should be taken into account when planning a hard fork date, and make a release of the 2–3 most used tools (and not just the VM as some base layer) a precondition for some date settlement.
  4. There are reports that other security researchers in the ecosystem suspected this vulnerability but did not prepare an official bug bounty or report it because they assumed it was already known.
  5. ACTION ITEM: Create or update existing documentation on how to disclose vulnerabilities or suspected vulnerabilities. Include extensive resources for reporting or discussing potential issues — not just official bug bounties. These should include key bug bounties across the ecosystem (not just the official EF bug bounty program), relevant chat rooms, encouraging people to comment on EIPs, pointing people to Github issues, security, bugs, PGP, support email, forums, security Telegram channels, etc..
  6. ACTION ITEM: Create a more formalized process for reviewing and analyzing EIP’s potential effects across the entire stack (EVM vs existing developer assumptions vs smart contracts.)
  7. ACTION ITEM: Be much more explicit about writing down invariants (properties guaranteed by the protocol) that we rely on so we can check against them when changing things.

Constantinople Decision Process Overview

Fully understand and convey the threat to the community

  1. In this case, the extensive write-up by ChainSecurity made it very easy for folks to understand the potential threat and catch up quickly as they came online. In future cases, this may not be the case.
  2. For future cases, we should take a proactive approach and assemble a team of technical writers (much like the emergency comms group from this case) who can be made sure to push out an extensive write-up explain the potential threat for the Ethereum community.

Identify the size of the threat

  1. Determine if this is the only threat or if it opens up other potential vulnerabilities.
  2. Determine whether the size of the threat can be fully verified given time constraints
  3. After determining the size of the threat, discussion opens for possible solutions or mitigations strategies.

Major factors involved in the decision of postponing the fork

  1. First of all, can the issue be fixed or patched?
  • Hold a discussion surrounding the possibility of removing the change and moving forward or doing a complete postponement.
  • Run further security testing analysis (EthSecurity, ChainSecurity, TrailOfBits)
  • Once options are collected, it is important to calculate the timeframes required for each choice.

2. What is the potential effect/risk of each choice?

  • Risk of delaying fork: Some party having the ability to take advantage of the confusion and scamming people or allowing a double-spend.
  • Risk of not delaying fork: If we verify that there are no notable risks to any deployed contract, we could take note of this needed change and move forward with the fork. We must be 100% certain that there are no vulnerable contracts (see “Identify the size of the threat” above) if we were to move forward with the fork.

3. Emergency Call with Ethereum Developers/Stakeholders

  • Ideas/discussion/problem solving via the AllCoreDevs Gitter chat
  • Moved to a Zoom call when chat became too inefficient

4. Discussion of trading security for efficiency with regards to delaying the fork.

5. Community sentiment — not a technical factor but one that was considered when voting for the final decision.

Decision window for aborting network upgrades (How much time we have to make this decision)

  1. Set a deadline to change the hard fork ~30 hrs before the estimated time.
  • The idea here was that if we set it to ~30 hrs, that gives us roughly 6 hrs to confirm (to a high probability) that no major contracts are affected.
  • In the future, there was a discussion that we can have a strict time-frame for this that must be followed.
  • Additionally, there was the talk of having a proposal for a smart contract-based abort switch might be brought up around this section. This would be up to client teams to decide. This was discussed on the AllCoreDevs channel and is now being discussed on Eth Magicians. (see reference #10)

2. The time-frame must include a period of time allocated to doing in-depth security analysis (at least account for a few hours).

  • In the future process document, we will include a calculation/time estimate for the right amount of time to get high-quality exploitability analysis? This will include the maximum length time we can afford to spend on an emergency security analysis.

3. During the analysis period, the comms team can start notifying exchanges and miners about the possibility of the postponing and let them know they should stand by for updates.

  • What amount of time would be the ideal amount to get the word out about reverting upgrade or upgrading again before tomorrow?

Security Testing analysis/scanning (When in an emergency)

  1. Determine the amount of people/teams we can have scanning for vulnerable contracts and make sure that they provide some confidence intervals for what they will be able to detect and what is currently deployed.

Proactive and Retroactive Actions for the potential postponement

  1. What would be the next step if this postponement results in an unintentional fork?
  • Social governance- the fork would die out soon. However, we can’t rely completely on social governance. What else can we do?
  • Prep a release or reply on people being able to pass a flag

2. Historical note: We pushed new releases last summer shortly before the actual fork (This worked well for Byzantium, but is not an ideal practice)

Do we feel confident that we can get every stakeholder notified of the change?

  1. This includes exchanges, pools, infra providers, etc.. We need to get the comms team to mass spread the news for the stakeholders so they can all get on board with upgrades in ~24 hours prior to the fork.

2. Points that were considered

  • Incorrect communication or lack of timely communication can result in confusion, FUD, possible contention and loss of funds/confidence for the uninformed.
  • This can increase the possibility of consensus issues resulting in double-spend attacks on smaller exchanges (most likely to affect exchanges already more at-risk due to lack of security).

The Decision

  1. Where the present stakeholders decide or rather opposing parties to “Speak up” on the final decision.
  2. This occurred during the Emergency Call with present Ethereum Stakeholders.
  3. Questions and thoughts that were considered:
  • If this issue was known 1 month ago, would we have chosen to fix it or leave it? This is important because a large part of the decision was the lack of information available to make a sound decision. The associated risk was unknown.
  • If we decide to postpone, we need to make sure the next postponement is much less of a hassle (with command line options).
  • It’s important that both Geth/Parity can confirm they can get new releases out very soon.
  • Until recently a large number of nodes hadn’t updated to Constantinople, there’s a risk that not enough can update again to the non-fork version. The count may need to be structured to be more representative if this is even possible. The Ethereum Cat Herders can start by publishing an estimate in various categories: miners, JSON-RPC gateways, blockchain explorers, and key dApp/dex/client teams that are running nodes.
  • Either remove the EIP (most likely) or fix it (less likely)
  • Reschedule the fork.
  • Adding in a second fork

How to best communicate the decisions to everyone (channels: blogs, social media, yelling, etc..)

  1. This time, we were lucky to have a team of people ready to reach out to exchanges and miners at any time. We made contact with 70–75% of hash rate currently (Sparkpool, Ethermine, DwarfPool, Nanopool, F2Pool) in time. Additionally, the group created a last-minute stakeholder communication Google Sheet that contains the contact information for a majority of major exchanges, mining pools etc…
  2. The community needs to be told how to use the Release client that pulls all of the EIPs (or disable them) including the security vulnerable EIP
  3. The main source of decisions should be from official EF outlets. Other stakeholders are welcome to broadcast on their own outlets afterward.
  • Certain decisions (last minute, emergency, etc.) will benefit from coordinated outreach and communication from multiple sources (i.e. Constantinople delay). This helps stakeholders to determine the difference between real news and FUD. This outreach had a unified message but was copied across parties involved with the relaying. Drafted together, then posted individually. This gave a unified, distributed appearance of coordination and agreement.

Further Topics After The Postponement Decision

The decision to remove the EIP or fix it

  • Discussed the options proposed on Ethereum Magicians and proposals on the AllCoreDevsCall
  • For more details reference Jan 18th’s AllCoreDevs call

Scheduling the fork(s) (Strategy)

  1. The decision was made on the All Core Devs call to move forward with Peter’s idea to have two hard forks:
  • The first hard fork will be a new “Constantinople” which will include the planned EIPs except for the offending EIP-1283.
  • The second fork will be to “undo” the original Constantinople changes to give test networks and private networks a chance to downgrade — specifically the networks that already forked.

2. Both forks will be triggered on the same block on the Ethereum mainnet (block 7.28 million) which should occur on February 27th.

3. Individual testnets and private networks that already implemented the original Constantinople will trigger the two forks on different blocks at their discretion

Post Mortem Contributors: Charles St.Louis, Corey Petty, Hudson Jameson, James Pitts, Jordan Spence, Michael Hahn, Martin Swende, Taylor Monahan.

References

  1. Notes from the Ethereum Core Devs Meeting #53
  2. https://ethereum-magicians.org/t/immutables-invariants-and-upgradability/2440
  3. https://ethereum-magicians.org/t/remediations-for-eip-1283-reentrancy-bug/2434/
  4. https://www.parity.io/a-postmortem-on-the-parity-multi-sig-library-self-destruct/
  5. https://medium.com/@TokenHash/post-mortem-meeting-ethereum-classic-etc-january-2019-reorg-attacks-fef6290348ea?ref=tokendaily
  6. https://github.com/bitcoin/bips/blob/master/bip-0050.mediawiki
  7. http://www.kingoftheether.com/postmortem.html
  8. https://www.scribd.com/doc/309591980/ShapeShift-Postmortem
  9. https://news.bitcoin.com/looting-fox-sabotage-shapeshift/
  10. https://ethereum-magicians.org/t/abort-switch-for-clients-in-order-to-withdraw-a-planned-upgrade-to-mainnet/2480