Decoding the Storage of an Ethereum Contract
This article is part 1 of 2 part series that examines the storage mechanism of the Ethereum Virtual Machine (EVM).
Part 2 covers a storage decoding tool written using the concepts in this article.
The EVM allows for the execution of smart contract code. The contract state or memory is stored at the contract address. This storage can be thought of as an array like data structure of infinite length located at the address of the contract. The storage mechanism ensures there are no conflicts in storage locations and follows a set of rules. Using these rules we can decode the state of any contract. Decoding the data stored in a map requires knowing the keys that are used. Decoding of contract data is performed using the RPC call eth_getStorageAt .
The slot position
The position of a variable in the storage array of a a smart contract is dictated by the order it appears in the code, and the size of the variable. This position is known as a slot. If a variable is less than 256 bit, the EVM attempts to fit more than one variable in the space and therefore, more than one variable may occupy the space of a single slot in the storage array. A map or array will always occupy a single slot. The location of the elements of arrays and maps follow a set of special hashing rules which this article will go over. These rules are also described in the ethereum documentation.
The table below (table 1) provides a quick summary of the allocation rules that are followed by the EVM. We will look at two contract examples, and decode them using the rules provided in table 1
Simple Example with 256bit variables
First let us look at a simple example with all variables of 256bits(32 bytes length). Doing so allows us to look at the allocation without consideration for variable packing.
Note that when applying keccack hash to numbers , the number must be a 0 padded 64bit value.
All the decoding is performed using the ethereum RPC call eth_getStorageAt indicated as GetStorageAt in the article. Any language wrapper such as nethereum or web3j can be used to call this RPC api.
The following diagram (figure 1) shows how GetStorageAt calls are made to the address of the contract and the position value passed to it. The numbers on the left side in figure 1 are the positions of the variables. For base types(uint, string etc. ) this position can be passed into GetStorageAt to get the variable values. For an array the position will return the length of the array.
Array index are decoded by passing the Keccack hash to GetStorageAt for index 0. Each subsequent index of the array is located at the hash value summed with the position. This can be thought of as accessing the pointer to the array and incrementing its position to find each element , similar to C or C++.
Maps are a little more complex. The position value passed to GetStoragetAt for each key is the keccack hash of the key and the position of the map declaration. For multi-dimensional maps Keccack hash values is recursively called for the keys and the variable position. See the example in figure 1 for clarification.
Next lets look at an example where variable packing takes place. The thing to remember with packing is that:
- It only applies to base variable types(uint128, string , int etc) in the order of appearance. The EVM will pack as many variables in a 256bit space in the order the variables are listed in the code.
- Each map and array variable will take a up a new slot.
- The array variables mapping will follow packing rules. That is if an element is small than 256bits , multiple index of the array will occupy a single slot in the storage array.
These rules are explained in the ethereum documentation as well
Figure 2 shows and provides an explanation of the packing that takes place. When a type is less than 256bits in length the EVM attempts to pack additional variables into the slot. The EVM picks the variables to pack in the order they are listed. Maps and arrays always appear in a new position. However, the packing rules still apply for decoding array indices, and packing rules still apply for structs stored in maps. See Figure 2 for an explanation how the variables are stored in this case.
A note about inheritance. When a contract inherits other contracts then the storage variables of base contracts occupy the first slots of the storage array in the order of inheritance. The storage variables of the subclass will appear afterwards.
As noted we can use the rules descried in this article to decode the memory of an Ethereum smart contract. Part 2 will describe a tool that was written using the rules described in this article