Bug Disclosure — Solidity Code Generation Bug Can Cause Memory Corruption

Uri Kirstein
Certora
Published in
6 min readMay 27, 2021

In April 2020, we discovered a bug in the Solidity compiler (fixed in version 0.6.5) where the length of a dynamically sized array could overflow during allocation. Subsequent writes into the array were bounds-checked against this incorrect length so that writes past the end of the array were possible, potentially corrupting other data in memory.

This bug serves as a reminder that vulnerabilities can lurk in the gap between the source and the bytecode. Any program analysis that assumes writes into distinct memory allocations do not interfere was and is not sound on contracts compiled with older versions of Solidity. Indeed, this bug was discovered by John Toman of Certora while developing a bytecode-level memory analysis. We reported it to the Solidity team on March 27th, 2020, who released a patched version on April 6th.

Background

Memory Management with the EVM and Solidity

The memory model on the EVM is a flat array of bytes; there is no primitive allocation operation, nor is there a free operation. At the bytecode level, smart contracts are responsible for managing this array however they see fit, for example, by splitting it into distinct regions that contain the data for different logical objects.

In practice, the Solidity compiler abstracts away this segmentation process and generates the low-level memory management operations. The compiler uses a monotonically increasing “free pointer” to indicate the start index of unused space in the memory array. At each allocation, the compiler saves the current value of the free pointer, increments the pointer by the size of the allocated object, and then returns the saved value as the fresh “pointer” to the allocated memory.

Array Representation and Allocation

Arrays in Solidity are represented in memory with a block that starts with a 256-bit word containing the array’s length, followed by n array elements, which are always 32 bytes each (padding if necessary). To allocate the number of bytes required for a dynamically sized array, the Solidity compiler generates code that increments the free pointer by (n * 32) + 32; i.e., 32 bytes for each element of the array plus an extra 32 bytes for the length element.

Array Accesses

Reads and writes to an array element at index i is bounds checked by comparing i to the length stored in the first word of the array’s memory. If the check succeeds, the byte-addressed pointer to the i th element is computed as base_ptr + (i * 32) + 32, which skips the 32-byte length field at the beginning of the block. Assuming the Solidity compiler has properly managed the free pointer, the smart contract can be sure that reads and writes from the resulting memory index are independent accesses to all other objects allocated in memory.

General Description

The bug stems from the multiplication involved in computing the length in bytes of a dynamically sized array during allocation. If n is chosen to be near the maximum representable integer, then n * 32 will overflow, wrapping around. The result is a much smaller number of bytes being allocated than expected. The Solidity compiler did not check for this overflow in versions prior to 0.6.5. If a contract allows untrusted input to influence the length of a dynamically sized memory array, an attacker may be able to cause the length computation to overflow.

The result is a discrepancy between the array’s length as stored in the first 32 bytes of the newly allocated memory versus the amount of memory actually allocated by incrementing the free pointer. (Importantly, the stored array length represents the number of allocated elements rather than the number of allocated bytes). This causes later bounds checks to succeed even when the resulting EVM memory index of the element exceeds the range that was actually allocated. If an attacker can then cause writes to indices beyond the end of the allocated memory, they can potentially overwrite any other objects stored in memory.

Validation of the Bug

The code below contains a contrived example Solidity program, which demonstrates some of the potential effects of this compiler bug. This program is only intended as a simple demonstration of the bug; it does not reflect how smart contracts in reality typically use arrays.

pragma solidity >= 0.6.0 <= 0.6.5;
contract CorruptedState {
struct OwnedTokens {
uint owned;
uint promised;
}
mapping (address OwnedTokens) ownership;
// The following method allocates a memory array of uint256s of
// length `sz`. (Line 1)
// The length of the memory array in *bytes* is therefore sz * 32.
// If sz is large enough, the length in bytes not be representable
// in a uint256 and will overflow, resulting in a small number of
// bytes actually being allocated from the free memory pointer.
// Thus the length in elements stored in the first word of the
// array will not agree with the number of bytes actually
// allocated.
// Line 2 then copies struct into memory "after" the end of tmp
// (which, since only a few bytes were allocated, is actually
// to somewhere in the middle of tmp).
// Line 3 is a seemingly innocuous write to tmp, which can
// actually affect the struct t if sz is chosen correctly.
// Line 4 copies t back to storage.
function corruptMemory(uint sz, uint elem) public returns
(uint256) {
/* 1 */ uint[] memory tmp = new uint[](sz);
/* 2 */ OwnedTokens memory t = ownership[msg.sender];
/* 3 */ tmp[2] = elem;
/* 4 */ ownership[msg.sender] = t;
/* 5 */ return ownership[msg.sender].owned;
}
function getOwned() public returns (uint256) {
return ownership[msg.sender].owned;
}
}

The function corruptMemory accepts a size sz and an element element from outside. It then performs seemingly innocuous operations, but under buggy versions of the compiler, a carefully chosen size can cause element to be written into memory occupied by t. Since t is later copied back to storage, this has the effect of corrupting storage data. The memory layout after doing the memory writes is depicted below.

Figure: A potential memory layout for struct t in the contrived Solidity program.

Memory allocations start at offset 0x80. The first allocated variable (marked in red) is tmp which occupies 2 256 bytes. The length of tmp (marked in dark red) is located in offsets 0x80–0xa0 (32 bytes). The dark grey part denotes the copy of the OwnedTokens struct from the storage. Note that since the allocation of tmp overflows, the position the struct is written to is overriding a part of the memory that is supposed to belong to tmp. The memory of the variable t is marked in blue (it is a copy of the grey memory area). It coincides with the memory allocated for tmp, and importantly tmp[2] is the same memory slot as t.owned.
You can find full details about the vulnerability, its exploitation, and its reproduction on a local machine at Bug Disclosure.

Gas Implications and Feasibility

A reasonable question is whether the attack described above is feasible, given gas costs. At first glance, it would appear that initializing such a large array would require a prohibitively large amount of gas. However, it turns out that the initialization code also uses the length of the array in bytes to determine how many bytes to initialize. Since this length overflows, the initialization code only initializes those bytes that were allocated from the free pointer, resulting in reasonable gas costs.

Avoiding and Detecting Such Bugs

Over time, the security and reliability of the Solidity compiler will continue to improve. However, compiler bugs are common even in mature compilers, e.g., Toward Understanding Compiler Bugs in GCC and LLVM. This has led to an active area of research in applying formal methods to guarantee compiler correctness, like The CompCert C Compiler Validation Homepage.

The Certora verifier can automatically analyze all executions of a bytecode program. Since the analysis is done on the compiler’s output, it can detect the presence or absence of compiler bugs that may affect the contract’s correctness. In other words, the results do not depend on the correctness of the compiler itself. We are excited to continue using bytecode analysis to improve the security of contracts in the Ethereum community.

Acknowledgments

We thank the Solidity compiler team for helpful discussions about this issue and for quickly releasing a fix.

--

--