Learn Solidity lesson 13. Storage, memory, calldata and the stack.

João Paulo Morais
Coinmonks

--

In this chapter, we will look at the places where data is stored on the blockchain. There are four storage spaces: storage, memory, stack and calldata.

Storage is where state variables are stored. Remember that we declare state variables in contracts, and they are permanent. Any changes we make to state variables during a transaction are stored after the transaction ends.

We can think of storage as a database (and indeed it is implemented by a database). Storage works like a key/value dictionary, where both the key and the value are 32 bytes long.

I will explain in more detail how storage works, but if you are not interested in the details of how storage works, the most important thing to remember about storage is that it is a permanent storage location.

Memory is not permanent. Variables are placed in memory and used only during the execution of a function. At the end of the execution of the function, everything that was placed in the memory is erased.

Solidity has access to memory and the programmer can either create new variables in memory or read and change variables that have been created in memory.

Calldata is where arguments passed to functions are temporarily stored. It is not a place where we can create variables, because it is unique to function arguments. It is also not possible to change the values of calldata: it is read-only.

Finally, we have the stack. It is on the stack where the EVM processing takes place. The EVM puts information on and off the stack, and it is by manipulating this information from the stack that programs are executed. Solidity doesn’t have access to the stack, and unless we want to program at a low level, we don’t need to worry too much about the stack at this time.

We will study a little bit more about the stack when we talk about using assembly language in Solidity, but this is a more advanced subject.

The storage

The storage is where state variables are stored. At deploy time, the EVM reserves a dedicated space in the storage for the state variables. This space can only be reserved at deployment.

As the storage area of a contract is defined at deploy time, state variables can only be declared outside of functions. It is not possible to create a new storage space while executing functions.

Let’s look at the following code.

contract Storage {
uint256 public value1;
uint128 public value2;
uint128 public value3;
string public value4;

function newValue() public {
uint256 value5;
}
}

Externally to the functions, four state variables are defined: value1, value2, value3 and value4. The value of these variables are stored in storage, and their value is initialized at deploy time.

The variable value5 is defined inside the function, so it is stored in memory. Once the function newValue is invoked, a space in memory will be allocated to store the value of the variable value5. When the function execution ends, that value will be erased from memory, as well as any reference to it.

The storage is a key/value database, where both the key and the value are 32 bytes long. Storage in storage is sequential, with the key starting at 0. Let’s understand this.

We can think the storage as a place full of numbered containers, with a fixed size of 32 bytes. The first state variable defined in the contract is value1, of type uint256. It will be stored in the first container, with the key 0, and will occupy all its space, as the uint256 type is 32 bytes long.

The second variable, value2, will be stored in the second container, with key 1. However, as it only takes up 16 bytes (128 bits), it will not take up all its space. The next variable, value3, is also of type uint128, that is, it also occupies only 16 bytes. As it ‘’fits’’ in the space left over in the second container, it will also be stored in the second container.

And the next variable, value4? It is of type string. This means that it can be anything from a small word to a long text. That is, it is not possible to know its size in advance. It is not possible to know how many containers it will occupy.

This is why the string type is a reference type, not a value type. The third container will not store the string value, but a reference to where we can find the string, as well as its length.

Memory vs Storage

Let’s slightly modify the function of the last example as below.

function newValue() public view returns(uint256) {
uint256 valueReturn = value1;
valueReturn = 10;
return valueReturn;
}

Every variable of type value, when defined within a function, is automatically placed in memory. Thus, the variable returnValue is a variable in memory, which takes on the value of the state variable value1.

On the next line, the value of the variable returnValue is set to 10. As the variable is in memory, this value is erased at the end of the function’s execution. The variable value1 is not modified, so it is perfectly legal to declare this function as view: it does not change any state variables.

Let’s now modify the function to use a variable of type string, instead of a variable of type unsigned integer.

function newValue() public view returns(string memory) {
string valueReturn = value4;
valueReturn = 'Hello World';
return valueReturn;
}

When trying to compile, the compiler will indicate an error, as seen in the figure below.

We must indicate the place where the variables of reference type are located.

Unlike variables of type value, which are always placed in memory, variables of reference type can be in memory, in storage or in calldata.

Indicating that the variable returnValue is in memory will create a new variable in memory. Let’s see the code below.

function newValue() public view returns(string memory) {
string memory valueReturn = value4;
valueReturn = 'Hello World';
return valueReturn;
}

The above function is perfectly legal and will compile. The function’s return will always be the string ‘Hello World’. However, if we indicate that the variable returnValue is of type storage, we will get an error. Change the function declaration to as below.

string storage valueReturn = value4;

The line above is perfectly legal, but the next line will show an error, as we can see in the figure below.

The variable returnValue is a pointer to the storage.

The variable returnValue is a pointer (a reference). It indicates where the value of the variable value4 is located in the storage. We can’t change the pointer, just use it. The following code is perfectly valid.

function newValue() public view returns(string memory) {
string storage valueReturn = value4;
return valueReturn;
}

The return will be the value of the variable value4. Of course, in the above case, it would be more practical to directly return the variable value4.

In the case of strings, using pointers to storage is not very useful, but it can be useful when working with more complex types, such as maps and arrays. We will return to this subject in another lesson.

Memory vs Calldata

Memory is a place where we can create and store variables temporarily. Calldata is where the arguments that functions receive live. Let’s see the code below.

function concatenate(string memory s1, string memory s2) public pure returns (string memory) {
return string.concat(s1, s2);
}

The function concatenate joins two strings into one. Pay attention to the function parameters, s1 and s2, which we indicate is in memory. In doing so, we create the variables s1 and s2 in memory and manipulate them within the function. Let’s add a new line to the above function by explicitly changing the variable s2.

function concatenate(string memory s1, string memory s2) public pure returns (string memory) {
s1 = "Hello";
return string.concat(s1, s2);
}

As the variable s1 is in memory, we can change it at will. Let’s now change the above function, indicating that variables s1 and s2 point to calldata and not to memory.

function concatenate(string calldata s1, string calldata s2) public pure returns (string memory) {
s1 = "Hello";
return string.concat(s1, s2);
}

The compiler will now throw an error, as seen in the figure below.

Calldata is a read-only space.

Calldata is a storage space that can only be read, not changed, and in the code above we tried to modify its value explicitly. This generated an error.

The advantage of using calldata is that, on many occasions, we don’t need to unnecessarily create a new variable in memory. This saves gas fee, which is often a key requirement in smart contract development.

Thanks for reading!

Comments and suggestions about this article are welcome.

Any contribution is welcome. www.buymeacoffee.com/jpmorais

New to trading? Try crypto trading bots or copy trading

--

--

João Paulo Morais
Coinmonks

Astrophysicist, full-stack developer, blockchain enthusiast. Unraveling cosmos mysteries by day, crafting the next Latin American novel by night.