Post-mortem: ETH Router Exploits 1 & 2, and premature Return To Trading Incident
The ETH Router Exploit 1 & 2, Premature Trading, fixes and network response, as well as the 5 Pronged Response.
THORChain suffered two back to back exploits on its ETH Router. The first took all the ETH from the system via an attack contract that sat in front of the Router, and the second took all the economically significant ERC20s via an attack contract that sat behind the router.
In both cases the exploits were able to trick the Bifrost into reporting receiving assets it had not. The root cause was a Bifrost interface that did not fully account for the degrees of manipulation that can occur in smart contract events.
No other chains or assets were affected.
The THORChain team and community have kicked off a 5-Pronged Plan to address, fix and recover. They are detailed below.
The THORChain treasury will cover all losses to LPs. Nodes are not affected.
Exploit 1 — ETH
The attacker deployed a contract that sat in front of the Router, which was able to call the
deposit() function of the Router. The ability for the Router to be wrapped was recently made available to support ecosystem development. The full scope of this was not assessed thoroughly at the time.
The attack contract simply diverted the
msg.value back to themselves, calling with a value of
0 into the Router. The Bifrost read the
msg.value instead of the emitted deposit event. This is necessary to support Router upgrades, but should not have been for deposit events.
The fix was to enforce that for a deposit action, only the deposit event is read.
Attacker Wallet: 0x3a196410a0f5facd08fd7880a4b8551cd085c031
Contract Address: 0x4a33862042d004d3fc45e284e1aafa05b48e3c9c
Tornado Address: 0x4b713980d60b4994e0aa298a66805ec0d35ebc5a
A full write-up is available here:
Impact — $8m
The attacker deposited fake ETH into the contract many times, swapping to other ERC20s, artificially raising their prices and paying large amounts of fees. They then finally were able to siphon out the ETH by forcing a refund (using a deliberately bad memo).
In all they siphoned ~4200 ETH from the system, and caused a huge spike in arbitrage volumes.
Premature Return to Trading
The network was rapidly halted by nodes to limit the impact. Around 700 ETH was retained in pending outbounds. A subsequent update was then released to purge these outbounds and save the 700 ETH. The system thought it had 13,000 ETH, but it only had 700 ETH, so the update also contained a store migration to correct the balance. This store migration would cause the price of ETH in the system to go from $350 (13k ETH) to around $7000 (700 ETH), to reflect the actual pool balances.
The plan was to complete the update and allow arbitrage to sell the ETH down from $7k to $2k (actual price), so the brief to the admins was to enable trading after the upgrade.
The upgrade process was not adequately war-gamed. The upgrade instructions should have been to restart
thord , update, then immediately shut the Bifrost service down. This is because there is a narrow window of time from 67% updated to 100% updated where the old logic still applies, but the network is operational. Ideally a
mimir should have been in place to halt signing programmatically.
Once 67% had updated, the network restarted and began processing
txIns . What wasn’t planned was that ETH LPs began withdrawing asymmetrically to ETH take advantage of the fact that they were getting a claim on 13k ETH, when there was only 700 ETH.
The correct response to this was to ask Nodes to shut down their Bifrosts to stop the withdrawals (the system was rapidly becoming insolvent), OR, have in place a mimir to halt withdrawals. This
mimir setting hadn’t been built because of the system’s philosophy to never block withdrawals.
In the heat of the moment the on-duty mimir admin incorrectly inferred that the response was to enable trading to correct the ETH price to stop the abuse from ETH LPs. This was as per the brief, but it was premature, since the ETH price hadn’t yet been updated from the store migration. The end result was that trading being re-enabled caused arbitrage agents to buy cheap ETH, instead of selling expensive ETH. By buying the cheap ETH, the remaining ETH in the system was taken and the network went insolvent.
Nodes were asked again to halt.
Exploit 2— ERC-20s
The fixes to the issues above were then put in place and pushed out to the network. The fix also contained an ability to halt an entire chain and programmatically stop withdrawals. The fix also contained logic to divert the arb transactions to the treasury since the system was insolvent and could not fulfil them. This required the Bifrosts to be online to restart.
What was unknown at the time, was that there was another critical vulnerability in the ETH Router. The attacker created a fake router, then a deposit event emitted when the attacker sent ETH. The attacker passes
returnVaultAssets() with a small amount of ETH, but the router is defined as an Asgard vault. On the Thorchain Router, it forwarded ETH to the fake Asgard. This creates a fake deposit event with a malicious memo. The Bifrost intercepts as a normal deposit and refunds to an attacker due to a bad memo definition.
Last Transaction By An Attacker
Attack contract: 0x700196e226283671a3de6704ebcdb37a76658805
Attack wallet (spawned from Tornado Cash):0x8c1944fac705ef172f21f905b5523ae260f76d62
Impact (~$8M USD)
- 966.62 ALCX
- 20,866,664.53 XRUNE
- 1,672,794.010 USDC
- 56,104 SUSHI
- 6.91 YFI
- 990,137.46 USDT
5-Pronged Recovery Plan
The problems above have simple solutions, but the real question is why, not really how.
It is unrealistic that THORChain will ever be free from attack, so big picture thinking is needed, beginning all the way from the code to the live network. Why were critical vulnerabilities in the code for so long, why were they abused by black hats before white hats, why was THORChain able to send out so much of the TVL so quickly, why didn’t the system react faster.
They can be summarised as follows.
Problem 1: The ETH Bifrost Code was unaudited
The THORChain state machine and the BNB Bifrost Code was audited as part of Single Chain Chaosnet, but the updated MCCN state machine and its new MCCN Bifrosts were not. They were scheduled in with TrailOfBits, which unfortunately had not begun at the time of the first Exploit.
Fix: Stop and Audit. Both Trail of Bits and Halborn Security are underway with two simultaneous audits.
Problem 2: There was no Official Bounty Program.
As part of Single Chain Chaosnet a bounty program had been released, but it was not refreshed as part of MCCN. This was overlooked. Thus there were no clear incentives and campaigns for white hacks to be onboarded and find vulnerabilities.
Fix: Commission a Bounty Program with Immunify.
Problem 3: There is no ongoing “Red Team”
THORChain is an inside-out exchange. Exchanges have active security teams, even with locked-down proprietary exchange engines. THORChain needs a 24/7 continuously run Red Team to line-by-line each new PR, as well as actively monitor the network.
Fix: Commission a Red Team with Halborn Security.
Problem 4: THORChain has no active security monitoring
THORChain’s autonomous and decentralised nature was its own sword it died on. It happily executed attack transactions and there was nothing anyone could do. The only response was for all nodes to shut down their machines.
- Automatic Solvency Checker to halt as soon as a solvency is detected (pro-actively and re-actively)
- Node Operator Timeout— any node can call to time-out the network for 25 mins if they suspect anything. This gives an ability for each of the 36 Node Operators to timeout an attack when they observe it.
- Outbound Throttling — the txOut queue is throttled to artificially delay the settlement of transactions when there are sudden spikes.
Problem 5: There is no Protocol Insurance
Whilst the treasury is able to cover the insolvencies, the treasury won’t exist forever. The solution is to insure all non-RUNE TVL with a DeFi Insurance Provider, using collateral and income from the system’s own reserves.
Fix: Engage with DeFi Insurance Protocols to attempt to insure the entire protocol.
The network has a ~$16m insolvency to deal with. The plan is:
- 1/3rd ($5.3m) will be directly contributed from the treasury assets
- 1/3rd ($5.3m) will be loaned from Iron Bank using RUNE collateral and paid off later
- 1/3rd ($5.3m) will be arbed into the network after it is brought back online for trading.
To fund (2) and to partially cover (1), a large Public Fund Raising event will be commissioned after the network is operational, in the vicinity of $10m-$20m. This will be planned and executed in the public domain when the time comes.
Return to Operational.
The above Fixes need to be in place prior, they will take 2–3 months to set up fully.
However the network can be brought online in stages once enough of the code is thoroughly checked and the bounty program has solicited enough of the major bugs (if any). The guided timelines are:
- Network Restart (send RUNE, Bond, receive Block Rewards) — early August
- BNB Chain online — August
- UTXO Chains online — September
- ETH Chain online — October
The Overview is detailed extensively here:
A new tool for teams & individuals that blends everyday work apps into one.
A new tool for teams & individuals that blends everyday work apps into one.
Assuming all the Fixes are in place, the network is bought back online and is solvent, and can achieve stability, the timeline to Mainnet should be expect to be EoY 2021 or early 2022.
Mainnet is simply the definition that the network is stable and secure.
To keep up to date, please monitor community channels, particularly Telegram and Twitter:
- Twitter: https://twitter.com/thorchain_org
- Telegram Community: https://t.me/thorchain_org
- Telegram Announcements: https://t.me/thorchain
- Reddit: https://reddit.com/r/thorchain
- Gitlab (primary): https://gitlab.com/thorchain
- Github (secondary): https://github.com/thorchain
- Medium: https://medium.com/thorchain