Non-deterministic Solidity Transactions — Certora Bug Disclosure
John Toman of the Certora development team discovered a previously unknown code generation bug in the Solidity compiler. This bug allowed malicious bytes/string storage fields to deceive Solidity into including unrelated information from memory into arrays generated via abi.encodePacked
and similar methods. The bug was fixed in Solidity version 0.8.0.
Background
Bytes and Strings in Storage
As described in our previous post, bytes, arrays, and strings use a specialized representation to save storage data. In particular, if the array/string is less than 31 elements (bytes) long, then Solidity will store the length of the array in the same storage slot as the element data itself. The least-significant byte stores the length of the array times two (for reasons explained below), while the remaining 31 bytes hold the array data.
This “packed” representation contrasts with how arrays are typically laid out in storage: the array’s length is stored in one slot, while the array elements are stored in contiguous slots beginning at a location computed via a hash function. Note that this representation is also used for bytes arrays of length greater than 31, albeit with a slight modification. Instead of storing the length of the bytes array directly, the Solidity compiler will store the length times two plus one. This addition ensures code can disambiguate which layout is used (“packed” or “standard” ): if the least significant bit is not set, then the packed representation must be used. However, if the least-significant bit is set, then the array data lies elsewhere in storage.
Understanding the Bump Allocator
As described in our post on memory corruption, Ethereum memory is a single, contiguous array of bytes. The Solidity compiler gives structure to this code by using a “free pointer”, which is monotonically incremented throughout the contract’s lifetime. In particular, when allocating a new memory object, the compiler generates code that reads the free pointer’s current value, increments it by a value large enough to hold the allocated object, and then updates the free pointer. The old value of the free pointer before incrementing it is then used as a “pointer” to a segment of Ethereum memory reserved exclusively for the allocated memory object. In other words, after allocation, the region of memory reserved by incrementing the free pointer may not be used for any other purpose than storing the allocated object.
However, the region of memory allocated by Solidity is not necessarily unused. Solidity often uses all memory after the free pointer as a “scratch” area. This scratch area is used to construct the inputs for inter-contract calls, log messages, etc. Unlike a managed language like Java, “allocated” memory is not necessarily zero and can initially hold arbitrary data. However, the Solidity compiler usually generates code to initialize freshly allocated data segments to contain default values.
An Exception: abi
methods
As mentioned above, the bump allocator will pre-compute the size of an in-memory object and increment the free pointer by that amount. However, unlike (regular) arrays or structs, the length of an array created with methods like abi.encodePacked
are not pre-computed. Instead, the array is constructed "in-place". For calls like abi.encodePacked
(including encodeWithSelector
, encodeWithSignature
, etc.) the compiler generates (roughly) the following pseudo-code:
var ptr = freePointer + 32; // reserve 32 bytes for the length field
for(x in argumentsToAbiFunction) {
copy(src=x, dst=ptr)
ptr += sizeOf(x)
}
val lengthInBytes = (ptr - freePointer) - 32; // don't count the length field!
memory[freePointer] = lengthInBytes;
freePointer = ptr;
Here copy
is a pseudo-operation that copies the representation of x
(one of the arguments to encodePacked
) to the region of memory beginning at ptr
. Note that the length of the generated array is computed by simply subtracting the current value of the free pointer from ptr
, which points immediately after the last byte that was copied into the generated array. After the length is computed, the free pointer is finally updated. Crucially, the memory of the generated array is not initialized by Solidity. Solidity assumes that the copy
operation will have overwritten any data left over from previous usages as scratch memory.
The Bug: From Storage to Memory via abi
When Solidity determines that an argument to an abi
function is a bytes array or string in storage, it will test the least significant bit of the storage slot of the bytes array. Under pragma experimental ABIEncoderV2
, if the bit is not set (indicating the bytes array is packed), then the entire slot is divided by two and then masked with 0x7f
, i.e., 127. Effectively, this operation extracts the length stored in the least-significant byte slot, capping the length at 127 (not 31). The upper 31 bytes of the storage slot are then copied into memory at ptr
, and finally, ptr
is incremented by the value yielded from masking with 127, i.e., sizeOf(x)
for a packed bytes array is (slot >> 1) & 0x7f
.
Suppose the storage field has been maliciously crafted to include a value greater than 62 in its least significant byte. In that case, the array generated by the call to abi
will include uninitialized memory: the value of ptr
will be "over incremented". For example, if the least significant byte has a value of 254, then ptr
will be incremented by 127 bytes. However, as mentioned above, only 31 bytes of data are copied from storage, leaving the remaining 96 bytes uninitialized and potentially containing whatever values were previously written to scratch memory.
Impact and Resolution
The Solidity compiler team has acknowledged the validity of this bug and fixed it in Solidity version 0.8.0. Triggering this bug requires a contract to maliciously set its own storage to include an incorrectly encoded least significant byte. Although we are unaware of any contracts that currently exploit this bug, a malicious contract can introduce non-determinism into its own execution or into trusted library code via delegate calls. Note that a malicious actor can first corrupt a contract’s storage and then “upgrade” it using well-known upgrading techniques to an apparently trustworthy or benign contract that still exhibits this behavior.