THORChain
Published in

THORChain

Hardening the THORChain Protocol

Steps taken to make THORChain more resilient to attacks and network uncertainty.

Overview

There are number of changes being made to the THORChain protocol to make it more impervious to attacks, and being able to react quicker to save funds. These changes have been made from first-hand experience of several attacks, where the behaviour of the attacker, node operators and community were observed. This is a key part to formulating exactly what should be done and where.

The intent of these changes is to make any attacker think launching an attack on THORChain as not even worth the attempt, instead they should just file for a bounty and get paid. To do this, more power is given to nodes, more checks are in place continuously, and funds leaving the system are throttled to slow down leaks.

The last point to stress, is that these changes will have an interim negative effect on User Experience. Once these changes are live, the THORChain protocol is likely to pause more frequently, and swaps will take longer to settle. Wallet operators will need to communicate chain pause status continually, seamlessly detecting a pause and communicating this to users. Users on fast chains (BNB) will notice the settlement delay and as a result volumes may drop on these chains.

Over time as the network gains stability, the sensitivity of the network will be tuned down to improve UX.

They are:

  1. Automatic Solvency Checker
  2. Granular Network Pause Controls
  3. Node Timeouts
  4. Outbound Throttling
  5. Node Broadcast Bot
  6. Live Monitoring

Automatic Solvency Checker (ASC)

There are actually two discrete parts to this — a reactive mode and a proactive mode. In both cases the foundation of this is an ability for each node to scan wallet balances using their Bifrosts and report negative discrepancies between the on-chain balance and what THORChain thinks it has.

THORChain builds an awareness of balances purely by adding the incoming funds, then subtracting the outgoings. Thus the expected balance is the aggregrate of the ins and outs. Diverges occur when actual on-chain balances start to diverge. For gas assets (assets used to pay gas), small divergences can be intermittent, but normally less than +/- 1%.

Reactive Mode. THORNodes continually monitor inbound vaults (asgard) and when the scanned balance diverges from the expected balance, nodes can witness to THORChain the observed insolvency. If more than 2/3rds report, then that chain will automatically pause inbounds and outbounds. The scan cycle is one scan every 1–2minutes. It is throttled because it is resource intensive on the node RPC client. Reactive Mode ASC would be able to detect any loss of funds from vaults, where fake assets are used to deposit.

https://gitlab.com/thorchain/thornode/-/merge_requests/1797

Proactive Mode. This mode is more powerful and is intended to catch insolvencies even before they appear. When a node attempts to sign a txOut, it will do a calculation to check if by executing the txOut the vault go insolvent. If so, then it refuses to sign and reports an insolvency.

https://gitlab.com/thorchain/thornode/-/issues/1046

Granular Network Pause Controls

Previously the only mimir ability was to “halt trading” which stopped swapping and adding liquidity. It did not stop refunds or withdrawals, which allowed attackers to siphon funds even when halted.

Note: Previously referred to as a “halt”, the language will be changed to a “pause” because it is likely to happen more often and more frequently, and describes a network state that is temporary and likely to be quickly resumed. When paused, the network is still online and running, just withholding completing a certain action until resumed.

The new Network Pause controls:

PauseTrading. The entire network will be paused for swaps/add, or set for a discrete chain. If a discrete chain, then swaps to/from that chain are refunded.

PauseTradingGlobal | PauseTrading{CHAIN}

PauseChain. The network will pause trading as well as refunds and withdrawals. It will stop signing the queue, as well as stop observing new inbounds. Nothing can enter the network. RUNE deposits are refunded.

This network state is akin to 1/3rd of nodes shutting down their bifrosts, but more graceful.

PauseChainGlobal | PauseChain{CHAIN}

PauseTHORChain. The THORChain ledger will stop processing all SWITCH, MsgDeposits and MsgSends. This effectively freezes the network allowing the network to trap attacks, but still produces blocks. The only action allowed will be mimir — to react to the changing situation.

This network state is akin to 1/3rd of nodes shutting down their thor-daemon, but more graceful.

PauseTHORChainGlobal

https://gitlab.com/thorchain/thornode/-/issues/1054

Nodes will be granted full mimir access prior to mainnet, this allows them to tweak network parameters if they can coordinate. Admin-mimir will be retained until Planned Obsolencence.

Node Timeouts

When attacks have happened, the node and community were very quick to observe, but there was no mimir setting to react, or the mimir was hesitant to execute the halt.

A new feature now grants each node a unilateral ability to call PauseChainGlobal on the network, once per churn cycle, and only for 720 blocks (1 hour). Each node that calls it will increase the pause for a further 720 blocks. Each node can also call to resume, in which case 720 blocks are deducted. This is a rudimentary and leaderless way to converge to a pause period that allows the network to respond to a global threat, but does not give any node an ability to pause the network and keep it paused for malicious reasons.

https://gitlab.com/thorchain/thornode/-/merge_requests/1847

Outbound Throttling

This last feature throttles the outbound queue so that during spikes of large amounts of funds leaving the network the swaps out are delayed. If an attack is discovered, then any node can call to pause and trap the outbounds. This feature does negatively impact the UX of large swaps, but for most users it won’t be an issue. As the network is hardened the throttle will be opened by mimir.

The following are rough approximations for delays.

  • $100/block–4 seconds
  • $500/block–20 seconds
  • $1000/block–40seconds
  • $5000/block–2.3 minutes
  • $10k /block— 5 minutes

The maximum delay is 60 minutes. Breaking up the transaction won’t change how it is scheduled, since the aggregrate value of all outbounds is counted. A smaller transaction that happens to get scheduled as the same time as a large spike in activity will also get delayed alongside everyone else.

Wallets and interfaces will need to monitor the outbound value queue and estimate to users the expected delay, which will be dynamic.

If this feature (plus Node Timeouts) had been in place during the previous attacks then the attacker would have been delayed a full 60 minutes, and nodes could have paused and saved the funds. Even if the attacker had broken into much smaller transactions, the maximum they could have gained in the first few minutes would be in the vicinity of $100k-$200k. This amount is now less than the bounty they could have gained, so they may have thought about simply reporting the bug instead.

https://gitlab.com/thorchain/thornode/-/merge_requests/1844

Node Broadcast Bot

Nodes will be given an ability to broadcast a signed messaged directly from their machines. A discord/telegram bot can monitor and relay to channels. The messages are not put anywhere on-chain, they are just signed using node keys. The discord bot checks the node is active and grabs their address, status and bond amount.

This feature allows node operators to stay anon but still broadcast help or emergency messages to the rest of the dev community at any time.

Live Monitoring

A second bot scans the network for any abnormal activity and immediately broadcasts the issue. This is just an ability for the network to monitor unusual spikes and investigate closer.

Conclusion

THORChain is an inside-out exchange. Nothing like it exists today. As a result, the THORChain community are learning on the fly and thinking with their feet. Collateral damage has been incurred, but the experiences are being used to fortify the foundations. Some of the means above negatively impact the UX of the network, but THORChain must be able to survive into the following decades. It can’t willfully sign away the entire TVL in its vaults within a few seconds from an attack, so a happy medium needs to be struck. Over time these measures can be dialled back. It’s currently a battlefield, and THORChain is bringing its best armour until the battle is won.

Community

To keep up to date, please monitor community channels, particularly Telegram and Twitter:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store