GM fam, welcome to my first medium post.
Last night I saw the following tweet from the chad z0age and realized I don’t know enough about how ABI encoding works, so here’s my personal digest after reading up the solidity docs.
As you may already know, in Solidity the ABI is used to encode function calls and data structures for communication when interacting with smart contracts. The ABI specifies how function, error and event arguments are encoded and decoded.
When calling a smart contract function, there’s two important things we must specify:
- The function we are calling, specified by the function selector
- The arguments we chose to pass with such call
In this article we’ll quickly skim through the former, to focus on the latter.
Function Selector
The first four bytes of calldata for a function call specifies the function to be called: these are the first 4 bytes of the keccak256 of the signature of such function.
The signature is defined as the string containing the function name, followed by the parenthesised list of parameter types split by commas.
e.g. :
1. "transferFrom(address,address,uint256)"
is the signature of the notorious ERC20 method.
2. "addressProcessBundle((uint256[2],address[],(uint256,(uint256,address,bytes)[])[]))"
is an example of a signature using a struct composed of a fixed-size array, a dynamic-size array and a dynamic-size array of a struct containing a number and a dynamic-size array of another struct containing a number, an address and a dynamic-size array of bytes.
Hashing the above strings and taking the first 4 bytes from the left will yield their selector:
Argument Encoding
Starting from the fifth byte of calldata, the encoded arguments follow.
An important distinction is made between static and dynamic types: static types are encoded in place, while dynamic types are encoded at a further position than the current “block” of arguments.
The dynamic types are: bytes
, string
, T[]
, T[k]
for any dynamic T
and k >= 0
and (T1, .., Tk)
if for some 1 <= i <= k
, Ti
is dynamic.
All other types are static!
Before delving into the formal spec of the encoding, we define:
len(a)
as the number of bytes in a binary stringa
enc
, the actual encoding, as a mapping of ABI values to binary string, such thatlen(enc(a))
depends on the value ofa
if and only ifa
is dynamic
Note that, by the definition of enc
, we have that if a
is of a static type, its encoding does not depend on its value (and viceversa)!
Formal Specification of the Encoding
Deep breath, we’re going deep.
For any ABI value X
, enc(X)
is defined recursively, based on its type.
- For structs
(T1, .., Tk)
fork >= 0
and any typesT1
, ..,Tk
:
the encoding is made ofk
“head” elements andk
“tail” elements, asenc(X) = head(X(1)) .. head(X(k)) tail(X(1)) .. tail(X(k))
whereX = (X(1), .. , X(k))
andhead
andtail
are defined forTi
as:
- ForTi
static:head(X(i)) = enc(X(i))
andtail(X(i)) = ""
(static types are encoded in place, remember?)
- ForTi
dynamic:head(X(i)) = enc(len( head(X(1)) .. head(X(k)) tail(X(1)) .. tail(X(i-1)) ))
andtail(X(i)) = enc(X(i))
(which is a complicated way of saying that, where you’d find the encoding ofX
if it were static, you’ll find an offset from the base of the struct, where you’ll find the actual encoding.)
We’ll get back to this with an example at the end - For fixed-size arrays
T[k]
for anyT
andk
:enc(X) = enc((X[0], .., X[k-1]))
i.e. it’s encoded as a tuple withk
elements of the same type - For dynamic-size arrays
T[]
ofk
:enc(X) = enc(k) enc((X[0], .. , X[k-1]))
i.e. it’s encoded as a tuple withk
elements of the same type, prefixed with the number of elements! bytes
of lengthk
:enc(X) = enc(k) pad_right(X)
i.e. it’s encoded as the number of bytes followed by the actual byte sequence, padded to the right so that its length is a multiple of 32string
:enc(X) = enc(enc_utf8(X))
i.e.X
is UTF-8 and interpreted asbytes
uint<M>
:enc(X)
is the big-endian encoding ofX
, padded to the left such thatlen(enc(X)) == 32
int<M>
:enc(X)
is the big-endian two’s complement encoding ofX
, padded to the left by0xff
isX
is negative and zero bytes ifX
is non negative such thatlen(enc(X)) == 32
bool
: encoded asuint8
where1
is used fortrue
and0
forfalse
bytes<M>
:enc(X)
is the sequence of bytes padded with trailing zeros so thatlen(enc(X)) == 32
I could also show the encoding of types like fixed
and ufixed
but won’t, given they still don’t have full support in Solidity v0.8.19.
A practical example
Now, I’d like to guide you through an example of how you could read raw calldata and decode it manually (if you’re up for a challenge, try encoding it manually).
Let’s take the calldata of a call to Balancer’s Vault, in particular that of a call to its swap
function, as it takes 2 structs and 2 uint256s as arguments.
Here you can find the tx we’re gonna be dissecting.
First off, let’s grab the calldata shown by Etherscan (which does a great job of giving us the calldata chunked in 32 byte slots):
Function: swap((bytes32,uint8,address,address,uint256,bytes), (address,bool,address,bool), uint256, uint256)
MethodID: 0x52bbbe29 Offset from start of argument encoding block
00000000000000000000000000000000000000000000000000000000000000e0 0x0000
0000000000000000000000008d7e58c0ebf988dbb31a993696286106964dd4f4 0x0020
0000000000000000000000000000000000000000000000000000000000000000 0x0040
0000000000000000000000008d7e58c0ebf988dbb31a993696286106964dd4f4 0x0060
0000000000000000000000000000000000000000000000000000000000000000 0x0080
0000000000000000000000000000000000000000000b3a7f984c82f6ffa3d428 0x00a0
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff 0x00c0
929a9b6d40e4723f690db77a7ebb65d3254be1e00002000000000000000004d0 0x00e0
0000000000000000000000000000000000000000000000000000000000000000 0x0100
0000000000000000000000000000000000000000000000000000000000000000 0x0120
000000000000000000000000677d4fbbcdd9093d725b0042081ab0b67c63d121 0x0140
00000000000000000000000000000000000000000000000006f05b59d3b20000 0x0160
00000000000000000000000000000000000000000000000000000000000000c0 0x0180
0000000000000000000000000000000000000000000000000000000000000000 0x01a0
Let’s first run through every 32 byte word from top to bottom:
- From bytes 0x00 to 0x1f we find 0xe0, where we should find the head of the first argument, a struct. Remember what this means? It means that at least one field of the struct is dynamic! In fact, the first struct has a
bytes
member. - In bytes from 0x20 to 0x3f, where we should find the head of the second struct, we find what looks like an
address
. This is indeed the first member of the second struct: in the following positions, up to 0x9f, you can see all other members. - In bytes from 0xa0 to 0xbf, we find the hex number 0x0b3a7f984c82f6ffa3d428 which is 13574434982555110814766120 in decimal base: the third function parameter.
- In bytes from 0xc0 to 0xdf, all 0xff bytes: this means that the fourth parameter was set to
type(uint256).max
Here’s what we know so far:
Now we’ve found 3 out of the 4 arguments, there’s not many places where the last one can be hidden: reading the first struct’s head as an offset, we’re driven to the eigth word from the top, which is the first member of the first struct, a bytes32
element.
After this, in each word we can find all subsequent struct members, until we find an 0xc0
where the final bytes
member should be.
At first, this might no make much sense, given that in the word starting from 0xc0
the second uint256
is placed, so wtf?
What solves this confusion, is understanding that this offset is not to be interpreted from the 0x00
byte of the argument encoding, rather it’s an offset based from where the first struct members are listed so 0xe0
.
So where is the bytes
member? At the word starting at 0xe0 + 0xc0 = 0x01a0
! Given that it’s an empty bytes array, this slot encodes 0
and no subsequent data is listed.
Here’s the full picture:
Conclusion
Hope this was an interesting read for you and that you learned something new as I did.
If you want to keep going and try more exotic combinations of types (e.g. dynamic-size array of struct which have arrays of structs holding bytes members), I recommend you pick up cast
from the foundry toolchain: make up some random signature with these crazy types and pass them through cast abi-encode
with whatever data you like and try to complete the exercise we did today.
cya next time anon