EVM Opcodes: Basics
Understanding EVM bytecode is crucial for analyzing unverified deployed smart contracts or writing highly optimized ones. This article covers the basics of EVM and opcodes, providing a foundation for further learning.
Bytecode Entry Point
When developing smart contracts in languages like Solidity, multiple external entry points are available to provide access to different functionalities. However, the EVM bytecode itself does not have designated entry points; execution starts from the first byte. The compiler handles function routing by comparing the first four bytes of calldata with registered function signatures.
Notice: This architecture can cause differences in Gas costs for identical functions depending on their order in the smart contract.
Deployment and Deployed Bytecodes
To ensure verifiable smart contract deployment, deployment bytecode is used. This bytecode is executed during contract creation and returns the deployed bytecode. It prepares the storage environment, often setting the contract owner and other configurations.
Notice: Even if a smart contract does not require a constructor, the target bytecode must be wrapped appropriately for correct deployment.
Conditional Statements
In Solidity, various statements depend on conditional logic, such as loops and if statements. At the bytecode level, the JUMPI opcode serves this purpose. It consumes two parameters from the stack: the location to jump to and a conditional parameter, which is processed as true if the value is non-zero.
Notice: Fixed-size loop inlining can reduce cyclomatic complexity and save Gas, as implemented in the OpenZeppelin Math library.
Execution Contexts
Transactions to a contract, as well as certain opcodes, create an execution context with a clean stack, clean memory, and calldata filled with transaction input parameters or specified data from the top context’s memory. Execution contexts are configured with the code to execute and the contract address to access transient and persistent storage. Once the current execution context ends, control returns to the top execution context, restoring the stack, memory, calldata, and modifying returndata with the returned value.
Execution contexts are traditionally used to change executing code, call, or deploy other contracts, but they can also be used for operating in a clean environment for safe code execution.
Storage Types
The EVM operates with various storage types, each serving different purposes:
- Stack: A mutable stack of 32-byte words, cleared after the current execution context ends. It is the cheapest and most useful storage, essential for processing data (e.g., mathematical operations and comparisons).
- Calldata: Immutable storage containing the current execution context’s input parameters. Operations are limited to loading data to the stack or memory and checking the data length.
- Returndata: Immutable storage filled with data returned by the last execution sub-context. Operations are limited to copying data to memory and checking the data length.
- Memory: Mutable storage cleared after the current execution context ends. Each size increase costs Gas. Operations include writing data to memory from the stack, calldata, or returndata and loading memory data to the stack. The data returned from the current execution context must be located in memory.
- Transient Storage: Mutable storage linked to the contract of the execution context, cleared after the transaction ends, and accessible from any execution context linked to the contract. It can be used for reentrancy locks but has potential usage flaws described in the article.
- Persistent Storage: Mutable storage linked to the contract of the execution context, preserved between transactions, and accessible from any execution contexts linked to the contract. Works at key-value basis, values at different keys does not affect each other.
These storage types provide flexibility in data processing and manipulation, allowing developers to choose the most suitable solution for managing Gas costs.
Basic Pure Opcodes SC Example
Consider the following contract specification:
interface Example {
// Should always return 42;
function action() pure returns (uint256);
}
The interface does not require any function routing. We can simply return the desired value regardless of the provided calldata:
// In bytecode: 602A 5F 52 6020 5F F3
PUSH1 0x2a // [0x2a] Push 42 to the stack
PUSH0 // [0x00 0x2a] Push 0 to the stack
MSTORE // [] Store 42 (32-byte representation)
// in memory at address 0
// causing 32 bytes memory expansion
PUSH1 0x20 // [0x20] Push 32 to the stack
PUSH0 // [0x00 0x20] Push 0 to the stack
RETURN // [] Return 32 bytes from memory
// starting from address 0
// actually containing 42
// (32-byte representation)
This code returns 42 for calldata matching the action function signature, fulfilling the specification requirement. However, it is hard to say that the smart contract fully matches the specification as it returns 42 for any input parameters.
Now we need to construct deployment bytecode to deploy this contract:
// In bytecode: 67602A5F5260205FF3 5F 52 6008 6018 F3
PUSH8 0x602A5F5260205FF3 // [0x..] Push deployed bytecode to stack
PUSH0 // [0x00 0x..] Push 0 to the stack
MSTORE // [] Store bytecode in memory at
// address 0 padded left with zeros
PUSH1 0x08 // [0x08] Push bytecode length to stack
PUSH1 0x18 // [0x18 0x08] Push bytecode offset to stack
RETURN // [] Return 8 bytes from memory
// starting from address 24
// containing deployed bytecode
Alternatively, to deploy arbitrary bytecode without constructing deployment bytecode, the following contract can be used:
contract AnyCode {
constructor(bytes memory code) {
assembly {
return (add(code, 0x20), mload(code))
}
}
}
This contract generates more deployment bytecode but allows arbitrary deployed bytecode setup using any smart contract development framework.
Conclusion
This article introduces basic EVM concepts and presents a simple smart contract written in pure opcodes.
Hint: A great resource to start learning EVM opcodes is evm.codes.