Issue 1203 Post-mortem — THORChain Unbond Bug

Summary of Issue 1203 — what happened, what went wrong and how the network recovered.

THORChain
THORChain
5 min readSep 6, 2020

--

Overview

A bug was found by a community member allowing them to take funds out of the reserve. Nodes were informed and they disconnected their Bifrosts, halting any further attacks and isolating all funds. A patch was released a few hours later, which allowed the network to upgrade later that day. Subsequently, nodes turned back on their Bifrosts until a day later the network was operational again.

The team and node operators now have experience with their first major funds-at-risk bug, as well as the associated recovery. The community member who found the bug was granted a security bounty of 50k RUNE, since it was a critical bug. No other funds were compromised and the network is completely solvent.

The team will apply some further features to make the process more orderly next time.

The Bug

At roughly 2am AEST Saturday 5 September, a node operator known to the team discovered a security bug which allowed them to be double-credited bonds whilst unbonding. They reached out to the team after successfully executing a trial-attack, where they were able to siphon out 38k RUNE from the reserve.

Out of abundance of caution, the team published a security notice and active node operators elected to disconnect their Bifrosts — the service each node runs to “connect” THORChain to Binance Chain. Disconnecting Bifrosts immediately functions to isolate THORChain’s state machine from any inbounds, and thus isolating all funds. It also serves to halt churning and interrupt all TSS routines.

The team quickly identified the problem and had a merge request a few hours later. This was then tested on a standalone network before applying a version update and releasing the new binary to the nodes at roughly 1pm AEST.

Since the Bifrosts were disconnected, this prevented churning (which relies on TSS), so the network had to wait for 100% upgrading. When the Bifrosts were turned back on, the network took sometime to recover and re-sync all nodes.

Explanation of Bug

A rough explanation is that there was an edge case where four separate logic paths worked against each other:

  1. Nodes can request unbonds of any amount from their Bond when in standby.
  2. THORChain will add any payload attached to an UNBOND transaction to the Nodes’ bond.
  3. THORChain applies a 1 RUNE fee on all outbounds.
  4. THORChain will refund any inbound transactions that aren’t fully executed.

Normally a payload of 0.00000001 RUNE is used in order to send a transaction into the network. To sync balances, this payload is simply added to the node’s bond.

The bug in this case was that a node operator requested an unbond of less than 1 RUNE (1) with a payload of 10k RUNE (2), which got added to their bond, but since 1 RUNE was less than the network fee in (3), it got reverted and refunded back to the operator (4), but THORChain did not deduct the payload from their bond. In this case, they were able to ratchet up their bond in the state machine then finally leave with more than what they actually put in. It required a simple fix.

Recovery

The network had completed the upgrade by 9pm AEST, upgrading the vaults and logic of the state machine.

Immediately following, nodes began restarting their Bifrosts, allowing transactions to flow back into the network:

Transactions began flowing immediately post upgrade

However, it was not until 1pm the following day until all Bifrosts had recovered allowing TSS to re-sync and for the network to be fully operational. In future, this should not take as long, it was only due to the nodes shutting down infra, requiring re-syncs of external daemons.

New Features

As a result of this, the team will be building two new features:

  1. Halt Outbounds (https://gitlab.com/thorchain/thornode/-/issues/604)
  2. Ban Removes Node From Consensus (https://gitlab.com/thorchain/thornode/-/issues/605)

The first feature will allow outbound transactions to be halted. The state machine will stop processing all withdrawals, allowing the funds to be isolated in an orderly manner. This will prevent nodes having to shut down infrastructure. Either Mimir, or super-majority of nodes can invoke this via thorcli tx.

The second feature will allow a super-majority of nodes to ban an unresponsive node and eject them from consensus, allowing for faster updates. Since a super-majority control the network anyway, it makes little sense to have the small-minority veto network upgrades.

Emergency Procedures

A new document has been completed, giving both the team and community better clarity on what to do during an emergency.

Keep in mind, THORChain is a decentralised network run by psuedo-anonymous nodes. The team have kept no registry of chaosnet nodes, encourage nodes to stay anon and have no idea who is running them. All aspects of the network require super-majority coordination, so upgrades and network responses may take some time.

Conclusion

THORChain is a leap-change in the problem of decentralised exchanges. The team are confident that it is the *simplest* way to solve the problem of secure, scalable cross-chain liquidity, but it is not without complexity. Manned flight is an example of a leap-change in both capability, but complexity. Flying machines are incredibly complex, but the airplane is the *simplest* way to achieve the problem of powered flight. Such is THORChain.

The bug found above was an example of multiple logic paths colliding to generate an edge case. In this case, the community were lucky the edge case was found by a node operator themselves, who are invested into the network and was not malicious.

There will be more problems to find and resolve ahead, but the end state will be a secure, robust and resilient network that is *proven*. There is only one way to prove out a new system, and that is to get in and fly.

Community

To keep up to date, please monitor community channels, particularly Telegram and Twitter:

--

--

THORChain
THORChain

The official team for THORChain — the decentralized liquidity network.