Agave exploit — Reentrancy in Liquidation Call

Stonky

Published in

AgaveFinance

6 min readApr 8, 2022

The following article is a post-mortem related to the reentrancy attack suffered by Agave on March 15th, 2022.

(This post-mortem was written by our devs)

The Event

On the 15th of March, Agave, along with Hundred.finance, was drained of all of its liquidity. The attack struck both of these projects at the same time using the same bug. How could this happen when Agave is based on a completely different codebase than Hundred, and both of them are based on codebases that, on mainnet Ethereum, secure billions?

Background

One of the most common bugs in any smart contract has to do with reentrancy. The general solution to these bugs is well known, and best practices have been established for quite some time. Essentially, there is a sequence of actions one must keep: Checks Effects Interactions. First Check the state of the world, to ensure it’s correct; then Effect your state change internally to reflect the action being taken; then Interact with other contracts (which might be reentrant). However, there aren’t currently any static analysis tools that can flag these errors, and it’s easy to miss one either initially or after a series of changes to the flow of a function.

Worse still, it turns out that blind application of this pattern can actually be your downfall. Such is the case for the Agave codebase, which contained a subtle error in this ordering in the liquidationCall function:

Vulnerability Details

In LendingPoolCollateralManager, line 222 we check that the liquidation ends up liquidating someone’s entire pool of collateral (vars.userCollateralBalance). If it does, then we turn off that collateral (effect), and then call the safeTransfer function to take the liquidated amount (interaction).

This sounds like we are following CEI — the steps are labeled in the previous paragraph. But actually, we are already dead. vars.userCollateralBalance was set on entry to this function; we calculated vars once at the beginning, to save gas. If it has changed somehow during this function, our check will generate incorrect results.

Specifically, if this user has deposited a bunch of this collateral during this function execution, but at the beginning of this function we were liquidating all of their collateral, then we will still turn off the collateral usage.

And so we come to line 212. It doesn’t seem like it, but this call to an agToken’s burn function results in a call to external code. If the token the agToken is wrapping has an on-transfer hook of some kind, the liquidator (if it’s a contract) will get to execute code here. So they can now do whatever they want just before we will turn off this asset as collateral.

Game Over

From here, just deposit a truckload of this collateral, then take out all the loans you can. We will give you whatever you can borrow against that amount of collateral you deposited. Then, once you return from that callback, we will turn off that asset as collateral. Now you can just withdraw all of it; you don’t have to pay back your loans because there’s nothing collateralizing them. You can even use flash loaned assets to do this, because once we return from liquidationCall, the collateral has been released and you can withdraw it in the same transaction.

But why?

Now, this code is unchanged from the original Aave codebase Agave took it from. And the token used in the exploit was WETH. So shouldn’t Aave be vulnerable to this on mainnet too? After all, the main purpose of using the Aave codebase entirely unchanged was to benefit from the battle-tested, heavily audited code on mainnet. And Agave was quite conservative in the assets it added; apart from STAKE and GNO, all assets on Agave are also on Aave (modulo WXDAI vs DAI). So how can this happen on Agave but not Aave?

It turns out that the answer lies in the chain itself, or more accurately, the chain’s bridge. On mainnet, of course, there is no bridge, and so all of these tokens are just “themselves”. WETH is just an ERC20 with deposit and withdraw functions, and the rest of the tokens are whatever they are on mainnet. However.

On other chains, if you want those tokens to exist on your chain, there has to be some kind of bridge. A bridge is made up of a contract on one side (mainnet) in which assets are deposited, and many contracts on the other side (Gnosis Chain) which represent those deposited assets. Those contracts’ implementations can be anything the bridge’s operators decide.

In this case, the original decision was poor, to say the least. They decided on an implementation that would always call a function on the target, if it was a contract, of any transfer. The intent of this was to enable the bridge to refuse to be sent tokens accidentally — but there was no restriction on this callback. Any contract would be called.

What that means is, while on mainnet you might have a vanilla ERC20 with no special behavior, once taken across the bridge this token becomes a reentrancy risk.

What’s worse, an audit of the contract was actually performed, in 2020. This audit called out the security risk in the contract.

The response on the part of the original deployers was to change this behavior in new tokens, but since there was no clean way to change the implementation of the existing tokens without a hardfork, they decided to leave them alone.

One large problem with this decision is that the tokens with the flawed implementations are the first ones to be taken across the bridge. The first tokens taken across the bridge will generally be the most commonly used, the “blue chips” so to speak. The foundational, “low risk” tokens, like WETH, WBTC, and USDC. And because of this decision, these tokens’ behavior would forever be surprising.

There’s a term for this sort of thing; it’s called a “footgun”. It’s a feature of your system that makes it very easy to shoot yourself in the foot. No one would expect tokens to sprout new and interesting security risks when being bridged to your chain. And even worse, if someone did check the behavior of a token by reading the contract in detail, it’s possible they’d read the contract of a new token and believe that they’d done their due diligence. But no, guess what? There’s an old version of the contract that you didn’t realize existed, so even tokens from the same bridge can’t be said to act the same!

This seems like the thing you’d want all of your developers to know, but most developers who have been calling this chain their home for quite a long time had no idea this issue existed. No one who knew — and there were many who knew, since there was a security audit and a new contract deployed — was volunteering the information.

Resolution

Finally, we come to the method of fixing the problem.

First of all, the most important thing is to update the tokens’ implementation to guarantee a safer environment for every user and developer in the network. That, in addition to a continued focus on vetting the tokens added, should suffice to make the protocol as safe as its counterparts in other networks.

However, since we know the specific error in the CEI practice in the liquidationCall function (and paid quite the bug bounty for it), we will upgrade the current implementation from the legacy contracts into a new one.
https://github.com/Agave-DAO/protocol-v2/commit/31922797ba110ddb3e908936b940b40221b7e190#diff-d237d9f48e3d6657a5f94c89b903c5003cba6f9a286e26eff509cb44a3f4ee8f