Lesson #0: Understanding the Fundamentals of Solidity Storage
If you’re not an advanced or expert Solidity developer, there’s a chance you haven’t fully grasped how Solidity handles storage. I didn’t really bother at first, but I’m glad I took some time to understand how it actually stores, remembers, and knows how to organize data on the blockchain.
I believe this can be compared to the shelving system of a library, if you consider how a librarian needs to know where each book goes to efficiently serve the visitors. It is important as a developer to understand how data is arranged to write optimized and effective smart contracts, as well as to assess some more complex data structures as a security auditor. Especially if you deal with upgradeable contracts.
In this article, I’ll seek to expose this concept with clarity, offering practical examples of how data is meticulously stored and managed in Solidity. This is the first article in my series of learnings related to smart contracts security research.
A Quick Intro to the EVM’s Storage Structure
Let’s start with a visualization of the EVM’s storage structure. When you create a smart contract, it gets stored on the Ethereum blockchain (or another EVM compatible chain). This storage is similar to a vast array of bytes, where each slot of this array can store 32 bytes.
Each of these slots is addressable and labeled with hexadecimal values, which is a base-16 numbering system that uses 16 distinct symbols: the numbers 0 to 9 (for values 0–9) and the letters A to F (for values 10–15).
Considering that storage slots on the EVM are addressed in 32-byte increments, the first slot starts at 0x00
, the next at 0x20
, and so on. Each one of these slots holds 32 bytes of data. For instance:
- the 28th byte at the first slot would be accessible at the address
0x1b
; - the 6th byte at the second slot would be accessible at the address
0x25
.
The Special Slots
Solidity reserves the first few slots of the storage for specific purposes.
For a more contextualized explanation, refer to this comprehensive and extensive article, part of the Secureum Epoch0 Bootcamp:
- Rajeev, “Solidity 201”, 100 more key aspects of Solidity, Secureum, [especially 127. Reserved Memory ; also 117. Storage Layout & Structs/Arrays].
To break it down, there are four 32-byte slots reserved as follows:
a. Slots 1–2 (0x00–0x3f): scratch space for hashing
The first two slots, covering 64 bytes from 0x00
to 0x3f
, are reserved for what is known as scratch space. This can be considered as a whiteboard where intermediate computations can be temporarily noted down.
Specifically, these 64 bytes are often used by hashing functions such as keccak256
and sha256
as a workspace for performing computations. This scratch space is crucial for efficiency; since it’s a small, designated area, the hashing functions can operate faster as they don’t need to search through a large memory area. It’s also volatile, meaning that once the hash function has done its job, the data in the scratch space can be overwritten or discarded.
b. Slot 3 (0x40–0x5f): free memory pointer
The next slot, spanning from 0x40
to 0x5f
, holds the currently allocated memory size, or free memory pointer.
While storage is persistent and written on the blockchain, memory is temporary and is erased once the contract execution is complete. This is generally used for holding data temporarily before either discarding it or saving it into storage.
Consider how a librarian needs to keep an eye out for the next empty shelf to store fresh reads. In Solidity, the free memory pointer acts like a bookmark that indicates the first empty shelf where new books can be placed. This way, whenever the contract needs to store new data in memory temporarily, it knows exactly where to place it (= the next unallocated memory slot) without having to scan through the entire bookshelf.
In technical terms, the free memory pointer is an address. When you store something in memory, you put it at the address indicated by the free memory pointer, and then you increment the pointer, so it now points to the next empty slot. Once the computation is complete, the EVM doesn’t need the memory anymore and the data is discarded. The free memory pointer, being part of this temporary environment, is reset the next time the contract is called. This process is similar to having a clean slate every time a contract function is executed.
c. Slot 4 (0x60–0x7f): zero slot
The zero slot, as its name implies, is a standard reference that always contains zeros. Think of it as a dictionary in a library that would always be in a constant place for everyone to refer to. It’s the known source for a particular kind of information; in this case, a string of zeros.
In the context of the EVM, having a known immutable reference can be very convenient. For instance, in Solidity, when creating a dynamic array in memory, it requires an initial state. Instead of explicitly writing a bunch of zeros every time an array is created, the program can simply point to this zero slot as the initial content of the array. This saves processing time and makes the code cleaner.
Additionally, this slot can be used in mathematical operations to reset values, or in logical operations as a base to compare against.
In certain cryptographic operations, it might be necessary to pad data to a certain length. The zero slot provides an easy and efficient way to achieve this by appending zeros from the zero slot to the data until the required length is reached.
It’s important to understand that the zero slot should never be modified. Changing the contents of this slot would be similar to modifying a critical constant in a codebase. This would cause confusion and errors as it serves as a persistent reference, essential to the deterministic behavior and reliability of the operations within the contract.
Why Does This Matter?
As a Solidity developer or researcher, understanding these storage fundamentals is essential for writing optimized and secure smart contracts. Knowing how the storage is structured and how the first few slots are reserved for special purposes can give insights into how Solidity operates behind the scenes.
Security is paramount when significant assets are at stake. To ensure that contracts are resilient against attacks, both developers and auditors must have a deep understanding of the storage layout. For instance, knowing the significance and constraints of the zero slot is critical to prevent contracts from manifesting unintended behaviors. Specifically, upgradeability in smart contracts introduces a more complex management of the storage structure, as it requires the storage layout to remain compatible across different versions of the contract. Understanding how it works, to avoid data corruption or loss, makes it not only important but absolutely critical.
Besides, gas optimization is vital on Ethereum, as inefficient contracts can lead to exorbitant transaction fees for users. A solid understanding of memory allocation, including the roles of reserved slots, is imperative for developers to craft code that is optimized for minimal gas usage. One approach is to write code in assembly within Solidity, which provides finer control over the EVM, by allowing for direct memory access and allocation with higher precision. However, this approach completely bypasses the “safeguards” provided by the higher-level constructs of Solidity. Accessing storage, especially reserved slots, must be done with extreme caution. A simple mistake could lead to unintended overwrites or incorrect data reads, which could be catastrophic, especially in a contract handling valuable assets.
a. Efficient storage packing to minimize gas costs
Consider a simple example with two smart contracts, one with a suboptimal storage layout and the other with an optimized layout:
In the example above, reordering the variables within the struct in OptimizedStructLayout
results in more efficient storage packing compared to SuboptimalStructLayout
.
b. Preserving data integrity in upgradeable smart contracts
Consider a deployed smart contract called TokenV1
, that needs to be upgraded to add a new feature or fix a bug, while making sure the data remains intact. The proxy pattern is a common pattern used for upgradeable smart contracts. Essentially, a proxy contract delegates calls to an implementation contract. When upgrading, the address of the implementation contract will be updated to point to the new one.
The storage layout must remain consistent across upgrades because the proxy contract relies on the layout when delegating calls to the implementation.
Here’s a very basic example:
In this example, TokenV2
has a new state variable maxTransferAmount
, but it is appended after the existing ones. This means the storage layout remains compatible.
However, if maxTransferAmount
was placed at the beginning or between existing variables, it would modify the storage layout, which could cause serious issues, as the proxy contract may not correctly map the storage slots and data may become corrupted or misinterpreted. Indeed, it could assume that totalSupply
is located in a specific storage slot, and mistakenly read the value of maxTransferAmount
instead.
It’s important to note that this example focuses on the most basic concern related to storage layout, but when it comes to upgradeable contracts, the considerations and complexities go much further.
This is why an in-depth knowledge of the EVM’s storage layout, including how storage and memory can be directly and precisely retrieved and allocated, is a linchpin for the development of secure, efficient, and adaptable smart contracts. This expertise is invaluable not only for developers but also for security and gas optimization researchers during audits, as it enables them to meticulously scrutinize the contract’s storage handling for potential vulnerabilities or inefficiencies.
Mastering the fundamentals of Solidity storage is crucial in forging the backbone of robust decentralized applications, empowering us to confidently welcome the next wave of users with the assurance of a secure and seamless experience.