Testing the Limits of EVM Stack Depth

Published in

Arbitrary Execution

6 min readNov 1, 2021

Last week, a debate was brewing in the Arbitrary Execution virtual office. As security researchers and Ethereum enthusiasts, we often entertain ourselves and pick up some interesting bits of knowledge by following along with solidity and EVM puzzles we find on the Internet (like Ethernaut’s tweets, for example). The question that led us down a rabbit hole this time seemed quite simple on the surface:

[Q5] Ethereum smart contracts do not run into the halting problem because:
(A) EVM is not Turing Complete
(B) EVM is Turing Complete
(C) EVM is Turing Complete but is bounded by gas sent in a transaction
(D) EVM is Turing Complete but is bounded by the stack depth

Well, we know the EVM is Turing Complete — that’s kind of the point — so we can toss away A, and we know that the concept of gas is an important aspect of Ethereum’s design as an anti-spam mechanism. Since we had all experienced cases of transactions reverting due to out of gas errors, choice C was looking promising.

However, one of our researchers insisted that stack depth was, in fact, a reason the EVM can halt. Another team member was eager to take the other side of that challenge, insisting that the 63/64 gas forwarding rules implemented in EIP-150 actually made it impossible to hit the stack depth. From the EIP:

If a call asks for more gas than the maximum allowed amount (i.e. the total amount of gas remaining in the parent after subtracting the gas cost of the call and memory expansion), do not return an OOG error; instead, if a call asks for more gas than all but one 64th of the maximum allowed amount, call with all but one 64th of the maximum allowed amount of gas (this is equivalent to a version of EIP-90 plus EIP-114). CREATE only provides all but one 64th of the parent gas to the child call.

Then, in the rationale section, some further explanation:

Additionally, EIP 114 is introduced because, given that we are making the cost of a call higher and less predictable, we have an opportunity to do it at no extra cost to currently available guarantees, and so we also achieve the benefit of replacing the call stack depth limit with a “softer” gas-based restriction, thereby eliminating call stack depth attacks as a class of attack that contract developers have to worry about and hence increasing contract programming safety. Note that with the given parameters, the de-facto maximum call stack depth is limited to ~340 (down from ~1024), mitigating the harm caused by any further potential quadratic-complexity DoS attacks that rely on calls.

In plain English, the EIP is trying to regulate the amount of gas forwarded to a call inside of a function call. Often, when a function is called externally, the original caller is unaware of the child calls the parent will have to make in order to return. Therefore, the original caller will likely have a poor grasp over how to account for the gas needed to complete the function ahead of time. So, this EIP sets a baseline, forwarding 63/64 of the total gas remaining.

The second part explains a consequence of this decision: gas left in subsequent calls will asymptotically approach zero, and will do so prior to reaching the stack depth limit of 1024. Mathematically, we can see that (63/64)¹⁰²⁴ is about 0.0000001. One would have to basically buy all 30mm gas available in a block to do anything interesting by the 1024th call, which would be prohibitively expensive.

So, was it true that the stack depth limit was no longer reachable after the Tangerine Whistle hard fork from October 2016, which included EIP-150? There was only one way to find out.

We put a Hardhat repo together and wrote some contracts to test this in short order. The dissenting researcher’s opinion was that the stack depth limit could be reached by having a recursive internal function call that diverges. We implemented the following super simple contract to test this:

If you call b() and take a look at the stack trace, you should see 1023 calls to StackDepthTest.sol functions, with the 1024th transaction throwing the error. Unfortunately, the error hardhat runtime environment throws in this case is “Error: Transaction reverted and Hardhat couldn’t infer the reason. Please report this to help us improve Hardhat.”

However, testing against a local ganache-cli gives a more edifying error message: “ProviderError: VM Exception while processing transaction: stack overflow”.

So, it seems certain that one can hit the stack depth limit by calling recursive internal functions. But how is this possible — what about the whole 63/64 gas forwarding thing and the added security EIP-150 was supposed to afford us…?

Well, we tested this by looking at the gasleft() after each recursive call inside our StackDepthTest contract. What we found was that the difference in gas passed to each child call was constant at 510. Thus, the 63/64 gas forwarding did not apply to internal functions, and for that reason the stack depth limit could be reached before an out of gas error.

This is not to say that EIP-150 is just hot air. We introduced another contract to test the gas consumption and stack depth limits of chaining external calls between two different contracts, which really has more to do with what the EIP was addressing.

Next, we created two instances of StackDepthExt and set their addresses equal to each other, so that when we callOther() they just end up calling each other back and forth. This time, hardhat threw a completely different error when testing callOther(): “Transaction reverted: contract call run out of gas and made the transaction revert”.

Exactly as suspected! Console logging the gasleft() inside each callOther() also confirms the expected behavior of the 63/64 forwarding mechanism outlined in EIP-150. The difference between successive gasleft() logs was not constant, but decreased according to the 63/64 rule. Additionally, the stack depth only reached about 380 calls, validating this section of EIP-150 from above: “Note that with the given parameters, the de-facto maximum call stack depth is limited to ~340 (down from ~1024), mitigating the harm caused by any further potential quadratic-complexity DoS attacks that rely on calls.”

OK, so we’ve learned that the expected behavior and base case is for the stack depth to reach a maximum of 300–400 calls, but that it is possible to get much deeper in the stack if one were to recursively call a bunch of internal functions. Could we use this to force some unexpected behavior?

To test this, we set up a stack depth attack contract, whose purpose is to call internal functions as many times as possible before calling into an external function and forcing a stack depth limit error instead of the expected out of gas error.

The results were surprising. When attack() is called with 504 or less, i.e. if i=504 and callOther() is the 505th call, the transaction reverts with an out of gas error. With any value of i above 505, the transaction still reverts, but with the stack depth limit errors that we expected. This was a bit puzzling, because the stack depth limit should have been a lot higher than ~500 based on the findings from the earlier set of test contracts.

The takeaway is perhaps a new piece of knowledge for the Ethereum devs reading this article: one can indeed force a stack depth limit error when making an external function call — without buying all the gas in a block — by calling some number of internal functions before making that external function call. The implications of this are still fuzzy, but it is an area in which our team is excited to keep looking as we continue with our research.

The EVM is a young and complex thing, and we are still learning more about it every day. The researchers at Arbitrary Execution are happy to do our part to crack it open and understand its nuances, and hope to proliferate that knowledge to help keep blockchain apps secure and users’ funds safe!

Testing the Limits of EVM Stack Depth

Written by R B