Learn EVM in depth #2. Executing the bytecode step by step in the deployment of a contract.

João Paulo Morais
Coinmonks
7 min readApr 17, 2023

--

In this lesson, we will generate a contract’s bytecode and analyze its execution by the Ethereum virtual machine. With this, we will understand the flow of execution of a program and the use of some opcodes. From the following lessons, we will see in more detail the opcodes defined by the EVM and the relationship between the opcodes, Solidity and Yul.

Let’s start by writing the simplest possible contract: an empty contract.

pragma solidity ^0.8.18;

contract Empty {}

Although the contract body is empty, a bytecode will be generated. This contract may seem like it does nothing, but it does at least one thing: it does not allow anyone to send Ether to it. If someone tries to execute a transaction for that contract by sending Ether, the transaction will roll back. And the transaction will roll back because the REVERT opcode will be executed.

The contract was compiled with version 0.8.18 and with optimization enabled for 200. The result is shown below.

6080604052348015600f57600080fd5b50603f80601d6000396000f3fe6080604052600080fdfea26469706673582212203e36aa9cdce1afe76ba43f8db331ebffbf999f00a4b256986e35ed233425b4fc64736f6c63430008120033

Deploying a contract to Ethereum

To deploy a contract on Ethereum, we must send a transaction without specifying a recipient address. This triggers the Ethereum virtual machine to execute the payload, which is precisely the bytecode. In this lesson, we will study exactly the execution of this bytecode.

The EVM has a register which is the Program counter. It holds the information of which opcode should be executed. The bytecode is a numbered structure, where each byte has a number. The first byte of the bytecode has the number 0x00, the second byte has the number 0x01, and so on, until the last byte. Our example bytecode is 92 bytes long, so it has 92 “lines.” So its first byte is on “line” 0x00, while its last byte is on line 0x5b, which is 91 in hexadecimal.

A great place to see this is at https://www.evm.codes/playground. It is possible to paste the bytecode, and it will show all the opcodes, in their respective lines, with the operator and the operands.

The figure below shows the result of the bytecode with its respective opcodes and its line number.

Each opcode is on a specific line.

Setting the free memory pointer

Let’s analyze the first few bytes of the bytecode: 60 80 60 40 52.

80 is the byte for the PUSH1 opcode, which indicates that the next byte should be pushed onto the stack. Therefore, 60 80 means: put byte 0x80 on the stack.

Likewise, 60 40 puts byte 0x40 on the stack. We can see the stack configuration in the illustration below.

After that, byte 52 is executed. It is the MSTORE opcode, which puts data in memory. We will see more about memory in a following lesson; for now, understand memory as a place where we can store data. This opcode does not expect operands but uses 2 values on the stack. The first value on the stack indicates where to store the data in memory, and the second value indicates what to store.

The first value on the stack is 0x40 and the second is 0x80, so this instruction will be: store 0x80 in memory location 0x40. After that, both pieces of information are removed from the stack and it becomes empty again.

What we are doing here is registering the free memory pointer. It indicates which memory location is free to be used.

Checking if Ether was sent in the transaction.

The following opcode is number 34, CALLVALUE. It puts the value sent by the transaction on the stack. This value will be dynamic, as each transaction will have its own callvalue. Let’s say the transaction didn’t send any value, so the callvalue will be zero.

The following opcode is number 80, DUP1, duplicating the stack’s first value. The stack had only one item worth 0x00 (the call value), now it will have 2 equal items. We can see this in the illustration below.

The following opcode is the number 15, ISZERO. It compares the first value on the stack with the value 0. If it is zero, it will remove the value and put 1. If it is not zero, it will remove the value and put 0. Since the first value on the stack is 0, it will be replaced by 1.

In this round, we checked if any value was sent to the contract. Doing this check was necessary because this contract doesn’t have a payable constructor, so it can’t receive Ether at deploy time. Now we must enter a conditional. The transaction should roll back if Ether was sent (0x00 will be on top of the stack). If Ether has not been sent (0x01 at the top of the stack) the deployment must proceed.

Conditional jump

The next instruction is 60 0f, which pushes the value 0f onto the stack. Now the stack has 3 items, from top to bottom: 0f 01 00. It’s time to run the conditional. Assembly doesn’t work with if/else conditions, but with jumps. The next byte is 57, which represents the JUMPI opcode.

JUMPI requires 2 stack values. The second value indicates whether the program counter should jump to a specific value or continue to the following line. If the second value is 0, go to the following line. If it is 1, jump to the indicated value at the top of the stack.

Since our second value on the stack is 01, the program counter will change to 0f. Remember that each byte is on a line? Now the line to be executed will be number 0f.

Since 0f is 15 in hexadecimal, the next opcode to be executed will be the sixteenth byte of the bytecode. In this case, it is byte 5b that represents the JUMPDEST opcode. This opcode does nothing; however, the destination of every JUMPI (or JUMP) must be a JUMPDEST. If the program jumps to an OPCODE that is JUMPDEST, the transaction will be rolled back.

Preparing to write the deployed bytecode into memory.

The next byte is 50, which represents the POP opcode. It removes an item from the top of the stack. Our stack is now empty. The next instruction 60 3f pushes the value 3f onto the stack. Then byte 80 is DUP1, which doubles the top of the stack. After that, 60 1d puts the value 1d on the stack. Finally, 60 00 puts the value 0 on the stack. Our stack now contains 4 items: 00 1d 3f 3f. The stack flow can be seen in the illustration below.

The next instruction is byte 39, which is the CODECOPY opcode. It will copy part of the bytecode into memory. This opcode requires 3 values, which will be the first 3 values on the stack. The first value is where in memory, the bytecode will be copied. The second value is the line where the code to be copied begins. The last value is the amount of bytes to be copied. After used, these 3 items will be removed from the stack.

Then, the instruction will be to copy to memory address 0, part of the bytecode, starting at line 1d and whose size is 3f bytes. The program is preparing to write the deployed bytecode to Ethereum. It first needs to get the deployed bytecode into memory to do this.

In case you wonder what happened to the free memory pointer, it just got ignored. The compiler doesn’t need it anymore because it’s already at the end of program execution.

Writing the deployed bytecode to the blockchain

The stack now has only 1 item, 3f, which is the size of the copied code in memory. The next instruction is 60 00, i.e. put 00 on the stack. Anyway, the last instruction, f3, is the RETURN opcode, which ends the story. Let’s understand what it does.

The RETURN opcode returns the transaction successfully, returning a certain amount of data. The data to be returned is in memory and is indicated by 2 values on the stack. The first value indicates where in memory the return must be fetched, and the second value indicates the size in bytes to be returned. Our stack is 00 3f. Therefore, what will be returned is at memory address 00 and has a size of 3f bytes. It is exactly the deployed bytecode that we wrote in memory!

That’s why it’s called deployed bytecode: it’s the part of the bytecode that will be stored on Ethereum. This is what happens in a contract account creation transaction: whatever is returned by the program will be recorded as the contract code.

Summary

It’s been a long road we’ve come. We analyzed an entire execution of a bytecode, which is executed when a new contract account is created. We saw that each instruction is on a particular line and that it is possible to use the JUMPI opcode to jump to a certain line conditionally. We have also seen that, in a contract creation transaction, what is returned by the transaction will be recorded as the contract code.

Thanks for reading!

Comments and suggestions about this article are welcome.

Any contribution is welcome. www.buymeacoffee.com/jpmorais.

--

--

João Paulo Morais
Coinmonks

Astrophysicist, full-stack developer, blockchain enthusiast. Technical Writer @RareSkills.