Knowsec Blockchain Lab | Depth understanding of EVM storage mechanism and security issues

Knownsec Blockchain Lab
7 min readSep 30, 2021

--

Introduction

EVM is a lightweight virtual machine designed to provide a virtual execution environment for the Ethereum network to run smart contracts regardless of hardware, operating system, and another compatibility.

Simply put, EVM is a completely separate sandbox. Code running in EVM cannot access the network, file system, or other processes, so as to avoid bad code that can destroy smart contracts or affect the external environment.

On this basis, Knownsec Blockchain Lab with you to depth understanding of EVM storage mechanism and security issues.

EVM storage structure

As you can see, EVM stores data in two categories:

- Data stored in code and storage is non-volatile
- Data stored in the stack, ARGS, and Memory is volatile

Let’s look at the meaning of each storage location:

code

When code deploys the contract, it stores the data field, which is the space of the contract content, that is, the space especially storing the binary source code of the smart contract.

Storage

Storage is a persistent Storage space where you can read, write, modify, and persist data for each contract. Storage is a large map with 2 ^ 256 slots (slots), each with 32 bytes, into which the state variables in the contract are stored, depending on their type.

Stack

The stack, known as the “run stack,” holds the input and output data of EVM instructions. Free to use, no gas consumption, the number of local variables used to hold functions is limited to 16.

The maximum depth of the stack is 1024, where each cell is 32 bytes.

Args

Args, also called CallData, is a read-only addressable space that holds the parameters of a function call. Unlike the stack, to use the data in the call data, you must manually specify the offset and the number of bytes to read.

Memory

Memory A simple array of bytes that primarily stores data at run time, passing arguments to internal functions. Addressing and extending based on 32byte.

Overview of EVM Datastores

Storage is a place where data is stored persistently by each contract. The way to store data is through slots. Now we will introduce how it is implemented:

State variables

1. Variables (constants) whose size is less than 32 bytes are stored as their index values in the order they are defined. The first variable has the index key(0) and the second variable key(1)…

2. Small values in a row may be stored in the same location. For example, if the first four state variables in the contract are of a uint64 type, the values of the four-state variables are packaged into a 32-byte value and stored in position 0.

Not optimization:

pragma solidity ^0.4.11;

contract C {

uint256 a = 12;
uint256 c = 12;
uint256 b = 12;
uint256 d = 12;
function m() view public returns(uint256,uint256,uint256,uint256){
return (a,b,c,d);
}

}

Optimization:

pragma solidity ^0.4.11;

contract C {

uint64 a = 12;
uint64 c = 12;
uint64 b = 12;
uint64 d = 12;
function m() view public returns(uint64,uint64,uint64,uint64){
return (a,b,c,d);
}

}

The structure of the body

Structures with sizes less than 32 bytes are also stored sequentially. For example, if the structure variable index is defined at position 0 and there are two members inside the structure, the two members are stored sequentially as 0 and 1.

pragma solidity ^0.4.11;

contract C {

struct Info {
uint256 a ;
uint256 b ;
}
function m() external returns(uint256,uint256){
Info storage info;
info.a = 12 ;
info.b = 24 ;
return(info.a,info.b);
}


}

Map

The map storage location is calculated by kECCAK256 (Bytes32 (key) + Bytes32 (position)). Position indicates the storage location of the variable corresponding to the key.

pragma solidity ^0.4.11;

contract Test {
mapping(uint256 => uint256) knownsec;

function go() public {
knownsec[0x60] = 0x40;
}
}

Array

Fixed-length array

Fixed-length arrays are stored sequentially as long as they are up to 32 bytes, but the compiler does a boundary check at compile time to prevent them from crossing boundaries.

pragma solidity ^0.4.11;

contract C {

uint256[3] a = [12,24,48] ;

function m() public view returns(uint256,uint256,uint256){
return (a[0],a[1],a[2]);
}

}

Variable-length array

Since the length of the variable-length array is variable, storage space will be reserved in advance when compiling the variable-length array. Therefore, the position of the state variable will be used to store the length of the variable-length array, and the specific data address will be calculated by calculating ‘keccAK256 (Bytes32 (position))’ to calculate the first address of the array. Add the length offset to get the specific element.

pragma solidity ^0.4.11;

contract C {

uint256[3] a = [12,24,48] ;

function m() public view returns(uint256,uint256,uint256){
return (a[0],a[1],a[2]);
}

}

Byte arrays and strings

If the length is less than or equal to 31 bytes:

1. For fixed-length arrays, the same as for fixed-length arrays;

2. For mutable byte arrays and strings, 0 up to 32 bytes are added to the stored value, and the last byte of the added 0 is used to store the encoding length of the string.

pragma solidity ^0.4.4;

contract A{
string public name0 = "knownsec";
bytes8 public name=0x6b6e6f776e736563;
bytes public g ;

function test() public {
g.push(0xAA);
g.push(0xBB);
g.push(0xCC);
}
function go() public view returns(bytes){
return g;
}
}

When section arrays and strings are longer than 31 bytes:

1. The variable position stores the encoding length, and the encoding length formula is changed to encoding length = number of characters x 2 + 1

2. The first position of the real stored value is obtained by the formula ‘keccAK256 (Bytes32 (position)). The remaining values are stored in the obtained position sequence, and 0 to 32 bytes are added in the last position.

string public name = "knownsecooooooooooooooooooooooooo";

Security issues

The storage structure and storage mechanism of EVM have been mentioned before, now we will discuss its security.

Uninitialized variables

Principle of vulnerability:

As mentioned in the official identity manual, local variables for arrays and maps are stored in storage by default, whereas the default types of local variables set in functions in the Solidity language depend on their own types, so if the above storages are set inside functions without being initialized, They’re kind of storing Pointers to other variables in the contract, and when we change them we change the variables that they point to.

Vulnerability contract, the purpose is to change the owner to his own address:

pragma solidity ^0.4.0;

contract testContract{

bool public unlocked = false;

address public owner = 0xCA35b7d915458EF540aDe6068dFe2F44E8fa733c;

struct Person {

bytes32 name;

address mappedAddress;

}

function test(bytes32 _name , address _mappedAddress) public{

Person person;

person.name = _name;

person.mappedAddress = _mappedAddress;

require(unlocked);

}


}

Vulnerability contract analysis:

You can see that the contract is not initialized when the new structure is created in the function section, so we can use this function to change the owner. But using this function, we also have to pass requirements, but that’s not hard because unlocked is also in our control.

Specific operation:

Call the test function, respectively, to the

_name incoming:

0x0000000000000000000000000000000000000000000000000000000000000001 (true)

_mappedAddress incoming:

0xfb89ecb0188cb83c220aadda1468c1635208e821 (personal address)

Pass arguments before:

Pass arguments after:

You can see that the address has been changed successfully.

Conclusion

The EVM memory is a ‘key=>value’ database, and the stored data can be checksum to ensure consistency.

However, it also interacts with smart contract language. When some of the rules conflicts, it is likely to be used for evil purposes by people with ulterior motives.

Therefore, the standard use of smart contract language is a necessary condition to avoid loopholes.

--

--