Understand EVM Opcodes, Write Better Smart Contracts

Daniel Yamagata
10 min readAug 25, 2022

--

Your good developer habits are leading you to write inefficient smart contracts. For typical programming languages, the only costs associated with state changes and computation are time and the electricity used by the hardware. However, for EVM-compatible languages, such as Solidity and Vyper, these actions explicitly cost money. This cost is in the form of the blockchain’s native currency (ETH for Etheruem, AVAX for Avalanche, etc.), which can be thought of as a commodity used to pay for these actions.

The cost for computation, state transitions, and storage is called gas. Gas is used to prioritize transactions, as a Sybil resistance mechanism, and to prevent attacks stemming from the halting problem.

Feel free to read my article on Solidity basics to learn more about gas

These atypical costs lead to software design patterns that would seem both inefficient and strange in typical programming languages. To be able to recognize these patterns and grasp why they lead to cost efficiencies, you must first have a basic understanding of the Ethereum Virtual Machine, i.e. the EVM.

What is the EVM?

If you are already familiar with the EVM, feel free to skip to the section, What are EVM Opcodes?

A blockchain is a transaction-based state machine. Blockchains incrementally execute transactions, which morph into some new state. Therefore, each transaction on a blockchain is a transition of state.

Simple blockchains, like Bitcoin, natively only support simple transfers. In contrast, smart-contract compatible chains, like Ethereum, implement two types of accounts, externally owned accounts and contract accounts, in order to support complex logic.

Externally owned accounts are controlled by users via private keys and have no code associated with them, while contract accounts are solely controlled by their associated code. EVM code is stored as bytecode in a virtual ROM.

The EVM handles the execution and processing of all transactions on the underlying blockchain. It is a stack machine in which each item on the stack is 256-bits or 32 bytes. The EVM is embedded within each Ethereum node and is responsible for executing the contract’s bytecode.

The EVM stores data in both storage and memory. Storage is used to store data permanently while memory is used to store data during function calls. You can also pass in function arguments as calldata, which act similar to allocating to memory except the data is non-modifiable.

Learn more about Ethereum and the EVM in Preethi Kasireddy’s article, “How does Ethereum work, anyway?”

Smart contracts are written in higher-level languages, such as Solidity, Vyper, or Yul, and subsequently broken down into EVM bytecode via a compiler. However, there are times when it is more gas efficient to use bytecode directly in your code.

LooksRare’s TransferSelectorNFT smart contract

EVM bytecode is written in hexadecimal. It is the language that the virtual machine is able to interpret. This is somewhat analogous to how CPUs can only interpret machine code.

Example of Solidity Bytecode

What are EVM Opcodes?

All Ethereum bytecode can be broken down into a series of operands and opcodes. Opcodes are predefined instructions that the EVM interprets and is subsequently able to execute. For example, the ADD opcode is represented as 0x01 in EVM bytecode. It removes two elements from the stack and pushes the result.

The number of elements removed from and pushed onto the stack depends on the opcode. For example, there are thirty-two PUSH opcodes: PUSH1 through PUSH32. PUSH* adds a * byte item on the stack that ranges from 0 to 32 bytes in size. It does not remove any values from the stack and adds a single value. In contrast, the ADDMOD opcode represents the modulo addition operation and removes three items from the stack and subsequently pushes the result. Notably, the PUSH opcodes are the only ones that come with operands.

The Opcodes of the Prior Bytecode Example

Each opcode is one byte and has a differing cost. Depending on the opcode, these costs are either fixed or determined by a formula. For example, the ADD opcode costs 3 gas. In contrast, SSTORE, the opcode which saves data in storage, costs 20,000 gas when a storage value is set to a non-zero value from zero and costs 5000 gas when a storage variable’s value is set to zero or remains unchanged from zero.

SSTORE’s cost actually varies further depending on if a value has been accessed or not. Full details of SSTORE’s and SLOAD’s costs can be found here

Why is understanding EVM Opcodes important?

Understanding EVM opcodes is extremely important for minimizing gas consumption, and, in turn, reducing costs for your end user. Since the cost associated with EVM opcodes is arbitrary, different coding patterns that achieve the same result might lead to greatly higher costs. Knowing which opcodes are the most expensive will help you minimize and avoid their usage when unnecessary. View the Ethereum documentation for a full list of EVM opcodes and their associated gas costs.

Below are concrete examples of unintuitive design patterns stemming from the cost of EVM opcodes:

Using Multiplication over Exponetentiation: MUL vs EXP

The MUL opcode costs 5 gas and is used to perform multiplication. For example, the arithmetic behind 10 * 10 would cost 5 gas.

The EXP opcode is used to perform exponentiation, and its gas cost is determined by a formula: if the exponent is zero, the opcode costs 10 gas. However, if the exponent is greater than zero, it costs 10 gas plus 50 times the number of bytes in the exponent.

Since a byte is 8 bits, a single byte is used to represent values between 0 and 2⁸-1, two bytes would be used to represent values between 2⁸ and 2¹⁶-1, etc. For example, 10¹⁸ would cost 10 + 50 * 1 = 60 gas, while 10³⁰⁰ would cost 10 + 50 * 2 = 160 gas, since it takes one byte to represent 18 and two bytes to represent 300.

From the above, it is clear that there are certain times in which you should use multiplication over exponentiation. Here is a concrete example:

contract squareExample {uint256 x;constructor (uint256 _x) {
x = _x;
}
function inefficcientSquare() external {
x = x**2;
}
function efficcientSquare() external {
x = x * x;
}
}

Both inefficcientSquare and efficcientSquare set the state variable, x, to the square of itself. However, the arithmetic of inefficcientSquare costs 10 + 1 * 50 = 60 gas while the arithmetic of efficcientSquare costs 5 gas.

For reasons in addition to the above cost of arithmetic, inefficcientSquare costs ~200 more gas than efficcientSquare on average.

Caching data: SLOAD & MLOAD

It is well known that caching data leads to far better performance at scale. However, caching data on the EVM is extremely important and will lead to dramatic gas savings even for a small number of operations.

The SLOAD and MLOAD opcodes are used to load data from storage and memory. MLOAD always cost 3 gas, while SLOAD’s cost is determined by a formula: SLOAD costs 2100 gas to initially access a value during a transaction and costs 100 gas for each subsequent access. This means that it is ≥97% cheaper to load data from memory than from storage.

Below is some sample code and the potential gas savings:

contract storageExample {uint256 sumOfArray;function inefficcientSum(uint256 [] memory _array) public {        for(uint256 i; i < _array.length; i++) {
sumOfArray += _array[i];
}
} function efficcientSum(uint256 [] memory _array) public {

uint256 tempVar;
for(uint256 i; i < _array.length; i++) {
tempVar += _array[i];
}
sumOfArray = tempVar;} } // end of storageExample

The contract, storageExample, has two functions: inefficcientSum and efficcientSum

Both functions take _array, which is an array of unsigned integers, as an argument. They both set the contract’s state variable, sumOfArray, to the sum of the values in _array.

inefficcientSum uses the state variable, itself, for its calculations. Remember that state variables, such as sumOfArray, are kept in storage.

efficcientSum creates a temporary variable in memory, tempVar, that is used to calculate the sum of the values in _array. sumOfArray is then subsequently assigned to the value of tempVar.

efficcientSum is >50% gas efficient than inefficcientSum when passing in array of only 10 unsigned integers.

These efficiencies scale with the number of computations: efficcientSum is >300% more gas efficient than inefficcientSum when passing in an array of 100 unsigned integers.

Avoid Object Oriented Programming: the CREATE Opcode

The CREATE opcode is used when creating a new account with associated code (i.e. a smart contract). It costs at least 32,000 gas and is the most expensive opcode on the EVM.

It is best to minimize the number of smart contracts used when possible. This is unlike typical object-oriented programming in which the separation of classes is encouraged for reusability and clarity.

Here is a concrete example:

Below is some code to create a “vault” using an object-oriented approach. Each vault contains a uint256, which is set in its constructor.

contract Vault {    uint256 private x;     constructor(uint256 _x) { x = _x;}    function getValue() external view returns (uint256) {return x;}} // end of Vaultinterface IVault {    function getValue() external view returns (uint256);} // end of IVaultcontract InefficcientVaults {    address[] public factory;    constructor() {}    function createVault(uint256 _x) external {
address _vaultAddress = address(new Vault(_x));
factory.push(_vaultAddress);
}
function getVaultValue(uint256 vaultId) external view returns (uint256) {
address _vaultAddress = factory[vaultId];
IVault _vault = IVault(_vaultAddress);
return _vault.getValue();
}
} // end of InefficcientVaults

Each time that createVault() is called, a new Vault smart contract is created. The value stored in the Vault is determined by the argument passed into createVault(). The address of the new Vault contract is then stored in an array, factory.

Now here is some code that accomplishes the same goal but uses a mapping in place of creating a new smart contract:

contract EfficcientVaults {// vaultId => vaultValue
mapping (uint256 => uint256) public vaultIdToVaultValue;
// the next vault's id
uint256 nextVaultId;
function createVault(uint256 _x) external {
vaultIdToVaultValue[nextVaultId] = _x;
nextVaultId++;
}
function getVaultValue(uint256 vaultId) external view returns (uint256) {
return vaultIdToVaultValue[vaultId];
}
} // end of EfficcientVaults

Each time that createVault() is called, its argument is stored in a mapping, and its ID is determined by the state variable, nextVaultId, which is incremented each time that createVault() is called.

This difference in implementation leads to a dramatic reduction in gas costs.

EfficcientVaults’ createVault() is 61% more efficient and costs ~76,300 less gas than that of InefficcientVaults on average.

It should be noted that there are certain times when creating a new contract from within a contract is desirable and is typically done for immutability and efficiency. The transaction cost for all interactions with a contract will increase with the size of a contract. Therefore, if you expect to store massive amounts of data on-chain, it’s likely better to separate this data via separate contracts. However, if this is not the case, creating new contracts should be avoided.

Storing Data: SSTORE

SSTORE is the EVM opcode to save data to storage. As a generalization, SSTORE costs 20,000 gas when setting a storage value to non-zero from zero and 5000 gas when a storage value is set to zero.

Due to this cost, storing data on-chain is highly inefficient and costly. It should be avoided whenever possible.

This practice is most common with NFTs. Developers will store an NFT’s metadata (its image, attributes, etc.) on a decentralized storage network, like Arweave or IPFS, in place of storing it on-chain. The only data that is kept on-chain is a link to the metadata on the respective decentralized storage network. This link is queryable by the tokenURI() function found in all ERC721s that contain metadata.

A standard implementation of a tokenURI( ) function. (Source: OpenZeppelin)

For example, take the Bored Ape Yacht Club smart contract. Calling the tokenURI( ) function with the tokenId, 0, returns the following link: ipfs://QmeSjSinHpPnmXmspMjwiXyN6zS4E9zccariGR3jxcaWtq/0

If you go to this link, you will find the JSON file that contains the BAYC #0’s metadata:

These attributes are easily verifiable on OpenSea:

It should also be noted that certain data structures are simply unfeasible in the EVM due to the cost of storage. For example, representing a graph using an adjacency matrix would be completely unfeasible due to its O(V²) space complexity.

All of the above code can be found on my Github

Thank you for reading, and I hope you enjoyed this article!

There are so many more gas optimizations and nuances that I did not have a chance to cover. To learn more, I suggest the following resources:

Please reach out to me and my team at Bloccelerate VC if you are building in Web3. We are always looking to back great founders.

Website

LinkedIn

Twitter

Feel free to also drop me a note if you have any suggestions for any toolings or topics that I should cover in the future

--

--