ABI Encoding Deep Dive

ljmanini
6 min readMar 16, 2023

--

GM fam, welcome to my first medium post.

Last night I saw the following tweet from the chad z0age and realized I don’t know enough about how ABI encoding works, so here’s my personal digest after reading up the solidity docs.

z0age’s nerd snipe

As you may already know, in Solidity the ABI is used to encode function calls and data structures for communication when interacting with smart contracts. The ABI specifies how function, error and event arguments are encoded and decoded.

When calling a smart contract function, there’s two important things we must specify:

  1. The function we are calling, specified by the function selector
  2. The arguments we chose to pass with such call

In this article we’ll quickly skim through the former, to focus on the latter.

Function Selector

The first four bytes of calldata for a function call specifies the function to be called: these are the first 4 bytes of the keccak256 of the signature of such function.
The signature is defined as the string containing the function name, followed by the parenthesised list of parameter types split by commas.
e.g. :
1. "transferFrom(address,address,uint256)" is the signature of the notorious ERC20 method.
2. "addressProcessBundle((uint256[2],address[],(uint256,(uint256,address,bytes)[])[]))" is an example of a signature using a struct composed of a fixed-size array, a dynamic-size array and a dynamic-size array of a struct containing a number and a dynamic-size array of another struct containing a number, an address and a dynamic-size array of bytes.

Hashing the above strings and taking the first 4 bytes from the left will yield their selector:

cast sig == gud tool

Argument Encoding

Starting from the fifth byte of calldata, the encoded arguments follow.
An important distinction is made between static and dynamic types: static types are encoded in place, while dynamic types are encoded at a further position than the current “block” of arguments.

The dynamic types are: bytes , string , T[] , T[k] for any dynamic T and k >= 0and (T1, .., Tk) if for some 1 <= i <= k , Ti is dynamic.
All other types are static!

Before delving into the formal spec of the encoding, we define:

  • len(a) as the number of bytes in a binary string a
  • enc , the actual encoding, as a mapping of ABI values to binary string, such that len(enc(a)) depends on the value of a if and only if a is dynamic

Note that, by the definition of enc , we have that if a is of a static type, its encoding does not depend on its value (and viceversa)!

Formal Specification of the Encoding

Deep breath, we’re going deep.

For any ABI value X , enc(X) is defined recursively, based on its type.

  • For structs (T1, .., Tk) for k >= 0 and any types T1 , .., Tk:
    the encoding is made of k “head” elements and k “tail” elements, as enc(X) = head(X(1)) .. head(X(k)) tail(X(1)) .. tail(X(k)) where X = (X(1), .. , X(k)) and head and tail are defined for Ti as:
    - For Ti static: head(X(i)) = enc(X(i)) and tail(X(i)) = "" (static types are encoded in place, remember?)
    - For Ti dynamic: head(X(i)) = enc(len( head(X(1)) .. head(X(k)) tail(X(1)) .. tail(X(i-1)) )) and tail(X(i)) = enc(X(i)) (which is a complicated way of saying that, where you’d find the encoding of X if it were static, you’ll find an offset from the base of the struct, where you’ll find the actual encoding.)
    We’ll get back to this with an example at the end
  • For fixed-size arrays T[k] for any T and k :
    enc(X) = enc((X[0], .., X[k-1])) i.e. it’s encoded as a tuple with k elements of the same type
  • For dynamic-size arrays T[] of k :
    enc(X) = enc(k) enc((X[0], .. , X[k-1])) i.e. it’s encoded as a tuple with k elements of the same type, prefixed with the number of elements!
  • bytes of length k :
    enc(X) = enc(k) pad_right(X) i.e. it’s encoded as the number of bytes followed by the actual byte sequence, padded to the right so that its length is a multiple of 32
  • string :
    enc(X) = enc(enc_utf8(X)) i.e. X is UTF-8 and interpreted as bytes
  • uint<M> :
    enc(X) is the big-endian encoding of X , padded to the left such that len(enc(X)) == 32
  • int<M> :
    enc(X) is the big-endian two’s complement encoding of X , padded to the left by 0xff is X is negative and zero bytes if X is non negative such that len(enc(X)) == 32
  • bool : encoded as uint8 where 1 is used for true and 0 for false
  • bytes<M>:
    enc(X) is the sequence of bytes padded with trailing zeros so that len(enc(X)) == 32

I could also show the encoding of types like fixed and ufixed but won’t, given they still don’t have full support in Solidity v0.8.19.

A practical example

Now, I’d like to guide you through an example of how you could read raw calldata and decode it manually (if you’re up for a challenge, try encoding it manually).

Let’s take the calldata of a call to Balancer’s Vault, in particular that of a call to its swap function, as it takes 2 structs and 2 uint256s as arguments.
Here you can find the tx we’re gonna be dissecting.

First off, let’s grab the calldata shown by Etherscan (which does a great job of giving us the calldata chunked in 32 byte slots):

Function: swap((bytes32,uint8,address,address,uint256,bytes), (address,bool,address,bool), uint256, uint256)

MethodID: 0x52bbbe29 Offset from start of argument encoding block
00000000000000000000000000000000000000000000000000000000000000e0 0x0000
0000000000000000000000008d7e58c0ebf988dbb31a993696286106964dd4f4 0x0020
0000000000000000000000000000000000000000000000000000000000000000 0x0040
0000000000000000000000008d7e58c0ebf988dbb31a993696286106964dd4f4 0x0060
0000000000000000000000000000000000000000000000000000000000000000 0x0080
0000000000000000000000000000000000000000000b3a7f984c82f6ffa3d428 0x00a0
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff 0x00c0
929a9b6d40e4723f690db77a7ebb65d3254be1e00002000000000000000004d0 0x00e0
0000000000000000000000000000000000000000000000000000000000000000 0x0100
0000000000000000000000000000000000000000000000000000000000000000 0x0120
000000000000000000000000677d4fbbcdd9093d725b0042081ab0b67c63d121 0x0140
00000000000000000000000000000000000000000000000006f05b59d3b20000 0x0160
00000000000000000000000000000000000000000000000000000000000000c0 0x0180
0000000000000000000000000000000000000000000000000000000000000000 0x01a0

Let’s first run through every 32 byte word from top to bottom:

  • From bytes 0x00 to 0x1f we find 0xe0, where we should find the head of the first argument, a struct. Remember what this means? It means that at least one field of the struct is dynamic! In fact, the first struct has a bytes member.
  • In bytes from 0x20 to 0x3f, where we should find the head of the second struct, we find what looks like an address. This is indeed the first member of the second struct: in the following positions, up to 0x9f, you can see all other members.
  • In bytes from 0xa0 to 0xbf, we find the hex number 0x0b3a7f984c82f6ffa3d428 which is 13574434982555110814766120 in decimal base: the third function parameter.
  • In bytes from 0xc0 to 0xdf, all 0xff bytes: this means that the fourth parameter was set to type(uint256).max

Here’s what we know so far:

we found the heads

Now we’ve found 3 out of the 4 arguments, there’s not many places where the last one can be hidden: reading the first struct’s head as an offset, we’re driven to the eigth word from the top, which is the first member of the first struct, a bytes32 element.
After this, in each word we can find all subsequent struct members, until we find an 0xc0 where the final bytes member should be.

At first, this might no make much sense, given that in the word starting from 0xc0 the second uint256 is placed, so wtf?
What solves this confusion, is understanding that this offset is not to be interpreted from the 0x00 byte of the argument encoding, rather it’s an offset based from where the first struct members are listed so 0xe0.
So where is the bytes member? At the word starting at 0xe0 + 0xc0 = 0x01a0 ! Given that it’s an empty bytes array, this slot encodes 0 and no subsequent data is listed.

Here’s the full picture:

breakdown of the calldata

Conclusion

Hope this was an interesting read for you and that you learned something new as I did.
If you want to keep going and try more exotic combinations of types (e.g. dynamic-size array of struct which have arrays of structs holding bytes members), I recommend you pick up cast from the foundry toolchain: make up some random signature with these crazy types and pass them through cast abi-encode with whatever data you like and try to complete the exercise we did today.

cya next time anon

--

--