Memory Isolation Violation in Deserialization Code — Certora Bug Disclosure
John Toman, VP of R&D at Certora, discovered a previously unknown code generation bug in the Solidity compiler versions 0.8.3 and lower. This bug allows maliciously crafted byte buffers to subvert the ABI deserialization routines, access unexpected memory, and/or introduce non-determinism.
Background
ABI Specification
In the EVM, data is exchanged between contracts with unstructured buffers of bytes. To support structured types such as structs and arrays, the Ethereum community has developed the ABI specification, which defines a serialization format used to encode structured types into byte buffers. The receiver can deserialize the buffer to reconstruct the original typed data. Crucially, the serialized buffer does not contain any information about what type of data is encoded in the buffer. Instead, the receiver must know a priori what type of data is expected to be encoded in the buffer. If a buffer does not conform to its expected encoding type, deserialization should revert.
Serialization and Encoding Format
The ABI specification makes a distinction between dynamic and static types. A dynamic type is any type that (transitively) contains a dynamically sized array; static types encompass all other types. Dynamic and static types are encoded differently. Values of static types are written directly into the current position in the buffer.
In contrast, dynamic types are encoded by writing an offset into the current position in the buffer. The actual data for the dynamic type can be found later in the buffer, as indicated by the offset. The offset is not an absolute index within the encoded buffer. Instead, it is relative to the beginning of the encoding of the “parent” object.
For example, consider encoding the struct s
type defined below beginning at buffer index i
. Field a
is written directly at index i
. Field b
is encoded by writing an offset o
to the next index (i + 32
). The offset o
is relative to i
, since i
is the beginning index of the encoding of the parent type, s
. The actual data for the array b
is written at byte buffer index i + o
. See the figure below.
Decoding Routines
The Solidity compiler does not trust that a serialized buffer is correctly formed. At every step, the deserialization process should check that the encoding is well-formed and revert if it discovers otherwise. For example, if an encoded array of uint length is greater than the number of bytes remaining in the buffer being decoded, the deserialization process reverts.
These checks also apply to offsets generated by dynamic types: if an offset results in an out-of-bounds byte buffer index, deserialization should revert. This check is implemented using pointer arithmetic and comparison operations. At the beginning of the deserialization process, the code computes the “end pointer” of the encoded buffer, which is one past the last byte of the buffer. Then, to check whether a dynamic type offset is valid, the code first checks that the offset is less than 2**64 (to prevent overflow). Then the code computes a pointer to the location specified by the offset and checks if this pointer is less than the end pointer. If not, the code reverts. Finally, the expected size of the encoded data is computed, and the code checks that enough bytes are remaining in the buffer for this to make sense. Again, if the expected size goes past the end of the buffer, the code reverts.
The Bug
There are two related code generation bugs. First, the check that offsets are less than 2**64
is omitted within a dynamic type. For example, when encoding the type uint[][]
, the checks are omitted for the inner uint[]
offsets. In principle, maliciously constructed buffers could include large offsets designed to cause pointer arithmetic overflow, which would allow "decoding" arbitrary regions of memory.
The second error is that the expected size checks are performed incorrectly. When decoding a statically sized array of dynamic types, instead of checking the correct size of the statically sized array, the deserialization code only verifies that at least 32 bytes remain in the buffer. A maliciously constructed buffer could cause the deserialization code to read bytes from memory past the end of the buffer.
A related bug occurs when decoding an array of dynamic types: the decoding routine only checks that there are enough bytes in the buffer for the length field of the encoded array. A malicious buffer can cause the decoder to incorrectly think that arbitrary bytes from memory past the end of the buffer are offsets. Both of these bugs are exacerbated by the first error above since overflow checks on these fake offsets are omitted.
The bug’s reproduction for Solidity Compiler versions 0.8.3 and earlier is available at this link: https://github.com/Certora/MemoryIsolationPOC.
Impact
This bug affects any code that decodes serialized data in EVM memory. In particular, it can undermine code that relies on actors to perform independent computation: a malicious actor can craft malformed buffers that deceive the deserialization routine into sharing results stored elsewhere in the transaction’s memory.
The Solidity team has acknowledged this bug and fixed it in Solidity version 0.8.4. See the Solidity team’s blog post about this bug for more details.