EVM Part II: The Journey of Smart Contracts from Solidity code to Bytecode

Zaryab Afser
Coinmonks
17 min readMar 10, 2023

--

Note:
This is going to be a Multi-Part Series with all extensive and imperative details about the Ethereum Virtual Machine, Opocdes, Bytecodes, etc.

The complete articles of this series with imperative add-ones shall only be published on the official website of Decipher with Zaryab.

Subscribe for Free now.
JOIN
zaryabs.com

While the first part of this article series was more about the overall idea of EVM, in this section, we dive a bit deeper into the technical side of the Ethereum Virtual Machine.

The purpose of this part is to lay grasp a better understanding of the entire journey of the smart contract, from compilation — to deployment — to execution of a contract, and create strong mental models around the same.

An eagle-eye glance at a smart contract’s journey will eventually lead to these 4 steps:

  • Development & Compilation of Smart Contracts
  • Deployment of Smart Contract
  • Initialization of Smart Contract (Execution of init code)
  • Execution of Smart Contract (More about this in the next parts of this series)

This article aims to provide not just high-level details but also dive deep into the technical aspects of smart contract compilation, bytecode, ABI, opcodes, instructions, etc.

In simpler terms, we understand everything that happens from the moment you finish your smart contract compilation till the time you deploy & execute its constructor and initialize the contract’s state states.

Let’s get started.

Journey of a Smart Contract

Let’s start from the very basics and write a super simple smart contract that allows the setting and retrieving of a unit variable called a pointer.

We will use this contract as an example as we start to witness its journey from a simple solidity code to bytecode. (We shall tweak the contract if need be, but not in this part).

// SPDX-License-Identifier: GPL-3.0

Now that you have your contract ready, this is where the next step comes in, i.e., Compilation.

Compilation

What happens when a Smart Contract compiles?

As soon as you compile a smart contract, it produces two very significant items:

  1. Bytecode
  2. Application Binary Interface (ABI)

So, when we compile our above-mentioned Test contract, it leads to:

  • This ByteCode👇
608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe608060405234801561001057600080fd5b50600436106100415760003560e01c80632f5f3b3c14610046578063a32a3ee414610064578063acfee28314610082575b600080fd5b61004e61009e565b60405161005b91906100d0565b60405180910390f35b61006c6100a4565b60405161007991906100d0565b60405180910390f35b61009c6004803603810190610097919061011c565b6100ad565b005b60005481565b60008054905090565b8060008190555050565b6000819050919050565b6100ca816100b7565b82525050565b60006020820190506100e560008301846100c1565b92915050565b600080fd5b6100f9816100b7565b811461010457600080fd5b50565b600081359050610116816100f0565b92915050565b600060208284031215610132576101316100eb565b5b600061014084828501610107565b9150509291505056fea2646970667358221220a1012465f7be855f040e95566de3bbd50542ba31a7730d7fea2ef9de563a9ac164736f6c63430008110033

Bytecode of Test Contract

  • And, this ABI 👇
[
{
"inputs": [],
"stateMutability": "nonpayable",
"type": "constructor"
},
{
"inputs": [],
"name": "getPointer",
"outputs": [
{
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"stateMutability": "view",
"type": "function"
},
{
"inputs": [],
"name": "pointer",
"outputs": [
{
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"stateMutability": "view",
"type": "function"
},
{
"inputs": [
{
"internalType": "uint256",
"name": "_num",
"type": "uint256"
}
],
"name": "setPointer",
"outputs": [],
"stateMutability": "nonpayable",
"type": "function"
}
]

Bytecode of Test Contract

So Far So Good — But what exactly are these?

Bytecode and ABI

Before understanding ABI, let’s quickly understand the concept of bytecodes. This will eventually help us get better clarity on the significance of ABI as well.

Bytecode, in very simpler terms, is a collection of instructions or opcodes that defines how a smart contract should be executed by the EVM. It is the core component that the EVM uses to process and execute smart contracts. (more on this in detail later)

Bytecode, as can be seen from our example contract above, is not at all human-readable.

These hexadecimal opcodes are only machine-readable as only the EVM can clearly understand and act according to them.

However, it goes without saying that interacting with a bytecode to perform a certain action in the smart contract isn’t possible for humans because for obvious reasons, i.e., it’s just not readable.

Well, then how do we perform any action on a smart contract?

Enters Application Binary Interface (ABI)

Ever heard of APIs (Application programming interfaces)?
APIs, in the world of computer science, is an amazing concept that defines the procedure for two pieces of software to effectively interact with each other. This is what allows you to interact with any given network, and backend services of libraries.

ABI, in the world of Ethereum blockchain, represents something very similar.

ABIs define a standard mechanism for interacting with smart contracts. These are human-readable interfaces that enable us to interact with the complicated EVM bytecode of a smart contract.

These interfaces are extremely crucial as they enable interactions between applications and smart contracts or even contracts to contracts.

For instance, in our example contract above, we can see how the ABI of the Test contract defines every detail about the function names, their stateMutability, the argument types, etc.

These details are then used to encode contract calls that are made to the EVM so that the virtual machine can read, understand and execute these instructions. Solidity provides very clear specifications on encoding and decoding of contract ABIs which we will explore later.

To quickly summarize the ABI vs Bytecode discussion:

While bytecode is the complex machine-readable instructions for EVM to execute smart contracts, ABIs are human-readable interfaces that provide a standard procedure to enable contract interactions either from off-chain or contract-to-contract interaction.

Understanding Bytecode

We already discussed the basics of bytecode in the section above. Now we shall dive in a bit deep into bytecode and try to understand a lot of fun stuff that happens behind the scenes.

Basics of Bytecode

Humans understand Solidity,
EVM understands bytecode

In very simpler terms, Bytecode is the low-level language that our solidity smart contracts (a high-level programming language) get translated to.

It technically represents a long sequence of machine codes or opcodes which are pieces of instructions that defines how a particular smart contract is supposed to behave.

Most importantly, these instructions are easily understandable by the EVM and thus allow them to interpret and execute smart contracts accurately.

Every single opcode represents a certain operation or action that must be performed on the EVM stack to get the desired outputs.

💡

Fun Fact: Each opcode is 1 byte long, hence Bytecodes.

Buckle up! Things start to get super-interesting from here on. 🦾

Creation and Runtime Code

Although bytecodes seem to be some complex machine-readable gibberish, they can further be categorized into 2 different types:

  1. Creation Code, and
  2. Runtime Code

Let’s decipher both of them.

Creation Code

Creation code, as the name depicts, is part of the bytecode that is responsible for the creation of the contract.

The sole purpose of the creation code is to initialize and set up the contract being deployed and make it ready for further execution.

This is the instance of the bytecode that includes the constructor logic, its parameters, free memory pointer(more on this later), or maybe even initializing some state variables, etc.

An imperative point that one must know about creation code is that it’s only executed by the EVM once.

Creation code mainly acts as a set of instructions (To-Dos) that the EVM must perform to deploy the contract adequately as well as initialize state variables as per the constructor.

💡

The ‘AHA’ Moment
The constructor logic of a smart contract is part of the Creation Code. And the creation code can only be executed once by the EVM.

Therefore, constructors of any smart contract are One-Time executable functions and cannot be called once executed.

For instance, in our Test contract example, do you remember how the pointer state variable was set to 100 inside the constructor? Well, that specific action happens during the execution of the creation code.

constructor() {
pointer = 100;
}

One more crucial point.

The creation code also includes the logic to generate and return the runtime code of the contract which is stored on-chain within the deployed smart contract address.

This means that the creation code is never stored on-chain. It’s the runtime bytecode that is stored on-chain for further execution of the contract.

As we can see, there are a few major actions that are performed by the creation code.

So let’s summarize to get a better idea.

There are 3 significant details about this creation code that we must keep in mind:

  • Creation code is executed only once, at the time of contract deployment. And never again.
  • It includes the constructor logic, arguments, etc. This is part of the bytecode that instructs the EVM to set up the constructor, initialize state variables in the constructor, etc.
  • It is responsible for returning the runtime bytecode and storing it on-chain.

While these 3 are the main actions performed by the creation code of a smart contract, there are a couple of other interesting procedures that take place during the execution of the creation code.

We will learn about it in the next sections below.

Runtime Bytecode

Runtime bytecode, unlike creation code, is part of the bytecode that actually gets stored on-chain and defines the smart contract.

Unlike creation code, this part of the bytecode doesn’t contain the constructor logic.

Since this is the part that is stored on-chain, it mainly includes every other opcode necessary for the EVM to interpret and execute the smart contract whenever there is an external call triggering the contract.

In other words, any on-chain interaction you do with a smart contract technically means an interaction with the runtime bytecode of the smart contracts which gets executed behind the scenes by the EVM.

Before You Proceed Further — Read This 👇

When it comes to bytecode, you might come across different terminologies around, which can be very confusing.

Ideally, there are just 2 ways you can categorize bytecodes, i.e., Creation Code & Runtime Code.

However, there are a few other terminologies like Deployed bytecode, init code, etc, in the Ethereum world, often used to define similar things.

Therefore, it’s highly recommended to read this article by Shane to eliminate any confusion around different terminologies.

Quick Comparision of Creation & Runtime Bytecode

Let’s take a quick look at the bytecode of our very own Test contract, mentioned above.

You can easily get them using simple solc commands.

  • Set up a hardhat project
  • Paste the Test contract(Test.sol) into its contract folder
  • Simply run the following solc commands:

To get the Complete Bytecode (Creation + Runtime Bytecode), run 👇

solc --bin contracts/Test.sol

To get only the Runtime Bytecode, run

solc --bin-runtime contracts/Test.sol
  1. Creation Code
0x608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe608060405234801561001057600080fd5b50600436106100415760003560e01c80632f5f3b3c14610046578063a32a3ee414610064578063acfee28314610082575b600080fd5b61004e61009e565b60405161005b91906100d0565b60405180910390f35b61006c6100a4565b60405161007991906100d0565b60405180910390f35b61009c6004803603810190610097919061011c565b6100ad565b005b60005481565b60008054905090565b8060008190555050565b6000819050919050565b6100ca816100b7565b82525050565b60006020820190506100e560008301846100c1565b92915050565b600080fd5b6100f9816100b7565b811461010457600080fd5b50565b600081359050610116816100f0565b92915050565b600060208284031215610132576101316100eb565b5b600061014084828501610107565b9150509291505056fea26469706673582212206a433c2968ca8580b1ef7783748d3a3732df8255700b5fd10744fdad4a1cd50364736f6c63430008110033

2. Runtime Code

0x608060405234801561001057600080fd5b50600436106100415760003560e01c80632f5f3b3c14610046578063a32a3ee414610064578063acfee28314610082575b600080fd5b61004e61009e565b60405161005b91906100d0565b60405180910390f35b61006c6100a4565b60405161007991906100d0565b60405180910390f35b61009c6004803603810190610097919061011c565b6100ad565b005b60005481565b60008054905090565b8060008190555050565b6000819050919050565b6100ca816100b7565b82525050565b60006020820190506100e560008301846100c1565b92915050565b600080fd5b6100f9816100b7565b811461010457600080fd5b50565b600081359050610116816100f0565b92915050565b600060208284031215610132576101316100eb565b5b600061014084828501610107565b9150509291505056fea26469706673582212206a433c2968ca8580b1ef7783748d3a3732df8255700b5fd10744fdad4a1cd50364736f6c63430008110033

If you observe, the creation code appears to be a bit larger than the runtime code.

That’s because the creation code has a bunch of extra opcodes at the very beginning which aren’t a part of the runtime bytecode.
Those extra opcodes can be seen below .👇

608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe

What are these extra opcodes in the creation code? Any guesses? 🤔

Yes, you perhaps guessed it right.

This is part of the creation code that deals with the constructor logic, its parameters, generation & storage of bytecode, and a few other things that we will discuss soon.

This part of the creation code is the first one to be executed by EVM, during any contract deployment and never becomes part of the runtime bytecode that sits on-chain.

Note: From here on, we may refer these extra opcodes as init code.

init code simply means the part of the creation bytecode that deals with initializing and setting up the contract’s constructor.

Deployment and Initialization of Smart Contract

Now that you have your contract written and compiled and you understand ABI and Bytecode that are achieved after compilation, it’s time to proceed to the next steps.

The next step in the journey of your smart contract is the Deployment and Initialisation of its States.

It’s time to expand our understanding of every single action that takes place during the execution of the init code(those extra bytecodes at the beginning of the creation code).

Tools you may need

In order to follow along and simulate some of the steps mentioned below, you can either use the Remix Debugger or even better, the EVM Playground.

Note: this section is explained in detail in the original article at zaryabs.com

Coming back to our Init Code…

We are now going to decipher every single opcode that is part of the init code, i.e., part of the Creation code responsible for initializing the constructor and a few other crucial actions.

Let’s break these down to a bit more readable format. If we translate each opcode into their readable format, this is what we get.

The following group of opcodes (init code) 👇

608060405234801561001057600080fd5b50606460008190555061017f806100286000396000f3fe

can also be displayed as 👇

[00]	PUSH1	80
[02] PUSH1 40
[04] MSTORE
[05] CALLVALUE
[06] DUP1
[07] ISZERO
[08] PUSH2 0010
[0b] JUMPI
[0c] PUSH1 00
[0e] DUP1
[0f] REVERT
[10] JUMPDEST
[11] POP
[12] PUSH1 64
[14] PUSH1 00
[16] DUP2
[17] SWAP1
[18] SSTORE
[19] POP
[1a] PUSH2 017f
[1d] DUP1
[1e] PUSH2 0028
[21] PUSH1 00
[23] CODECOPY
[24] PUSH1 00
[26] RETURN
[27] INVALID

Let’s convert all of these opcodes into something that we humans can understand.😑

In short, these opcodes(init code) basically instruct the EVM to do 4 main tasks:

  • Assigning a Free-Memory Pointer
  • Validating Non-Payable Constructor Check
  • Initializing state variables as per the constructor
  • Returning and storing the Runtime Bytecode

We will now dive in deep and understand each of these actions, especially the highlighted terminologies of the bullet points above.

1. Assigning a Free memory Pointer

Instruction [00] to [04]

[00]	PUSH1	80
[02] PUSH1 40
[04] MSTORE

This is one of the most crucial parts of the bytecode of almost every smart contract that is deployed.

This is where the EVM stores the free memory pointer.

Wait, What exactly is a Free Memory Pointer? 🤔

Free memory pointer can be defined as the pointer to the portion of memory that is unused and available for use to write any data.

It plays a significant role to prevent the overriding of data at any given part of the memory.

At any given point in time, this pointer helps us achieve the part of the memory that is available and can be used to store any data without any chances of anyone overwriting it.

At this point, a very obvious question that might pop up in your mind is:

What happens after a Free Memory is used?

Well, EVM is quite smart when dealing with memory.

Whenever there is a need to store any given data in memory, the EVM performs this action in two steps:

  1. Fetch the Free Memory (using the free memory pointer)
    In order to store the data, the EVM fetches the location of the free memory first.
  2. It’s easy to do as the EVM knows that the position for free memory is stored at location 0x40 (the free memory pointer).
  3. Update the Free Memory Pointer to a new position
    However, after it uses the free memory, it very cleverly updates the free memory pointer to the next position in memory that is free and ready to use.
  4. This ensures that although the EVM uses free memory for its purpose, it never forgets to update it to the next free space.
  5. And, therefore, never leads to any memory overwriting (unless we make the terrible mistake of doing so ourselves while writing assembly code).

We shall learn about an example of fetching and updating free memory in the next parts of this series. Stay Tuned.

2. The Non-Payable Constructor Check

Instruction [05] to [11]

[05]	CALLVALUE
[06] DUP1
[07] ISZERO
[08] PUSH2 0010
[0b] JUMPI
[0c] PUSH1 00
[0e] DUP1
[0f] REVERT
[10] JUMPDEST
[11] POP

Quick Note on JUMP, JUMPI & JUMPDEST 📝

Deciphering the Opcodes

  • [05] CALLVALUEFetches & Pushes the WEI amount(sent via transaction) to Stack
  • [06] DUP1Duplicates the 1st element onto the stack
  • [07] ISZERO Checks if the topmost item on the stack is Zero. Pushes 1 onto the stack if it is.
  • [08] PUSH2 0010Pushes 2 bytes onto the stack. In this case, it pushes 10.

Note: 10 is the location of the JUMPDEST instruction. Therefore, we push 10 to the stack here.

  • [0b] JUMPI Jumps to the location represented by the topmost item of the stack. In this case, it’s 10, i.e., the instruction of JUMPDEST opcode.
  • [0c] PUSH1 00Skipped Due to Jump opcode
  • [0e] DUP1 Skipped Due to Jump opcode
  • [0f] REVERTSkipped Due to Jump opcode
  • [10] JUMPDEST Control reaches here after JUMP, — the Jump Destination
  • [11] POPPOPS out everything from the stack.

All of these opcodes can basically be boiled down to the following actions:

  • Check if the msg.value (amount of wei) sent with the contract transaction is greater than zero.
  • If the msg.value is NOT greater than zero, then proceed with further steps.
  • However, if the msg.value is actually greater than ZERO (ether was sent with contract creation transaction) but the constructor isn’t marked as payable, then REVERT and stop any further execution.

💡Fun Fact:
All these checks exist because the constructor is not marked as payable. Therefore, the EVM needs to put in some extra work to ensure we aren’t passing non-zero ETH values to a non-payable constructor.

It means, if we mark the constructor as payable, we eliminate these extra opcodes. This means less work for EVM and eventually less gas consumption.

Thus marking a constructor as payable can actually save GAS.
But should we do that, just for saving GAS? 🤔

Read more about this interesting fact in the article.

Curious about what might happen if the constructor had a payable keyword?

No worries, we shall cover that in the next part of this series.

Note 📝

Although the next 2 remaining tasks of the init code are mentioned below, they are explained in much more extensive detail in the original article. Check out the article here to get a better understanding of the entire procedure.

3. Initialization of States

Instruction [12] to [19]

[12]	PUSH1	64
[14] PUSH1 00
[16] DUP2
[17] SWAP1
[18] SSTORE
[19] POP

----------------------

Deciphering the Opcodes

  • [12] PUSH1 64 — Push 0x64 to stack, i.e., 100 in decimal
  • [14] PUSH1 00 — Push 00 to stack, i.e., 0 in decimal
  • [16] DUP2 Duplicates 1st Stack item & pastes it on Stack 2
  • [17] SWAP1Swaps 1st stack item with the second one
  • [18] SSTOREStores 0x64 at slot Zero
  • [19] POPPops out all stack items.

Now that the EVM is done with its initial procedure of allocating free memory pointers, validating the constructor’s payable checks, etc, it’s time for it to start looking at our constructor body.

Recall our test contract’s constructor body.

The test contract’s constructor assigned a value of 100 to the pointer state variable.

And that’s exactly what the above-mentioned opcodes help us achieve, i.e., initialize the pointer state variable with the value 100.

4. Returning and storing Runtime Bytecode

Instruction [1a] to [26]

[1a]	PUSH2	017f
[1d] DUP1
[1e] PUSH2 0028
[21] PUSH1 00
[23] CODECOPY
[24] PUSH1 00
[26] RETURN

-----------------------------

Deciphering the Opcodes:

  • [1a] PUSH2 017fPush 0x17f (383 in decimals) onto the stack
  • [1d] DUP1Duplicate the stack zero item & paste it on stack 1
  • [1e] PUSH2 0028Push 0x0028 (40 in decimals) onto the stack
  • [21] PUSH1 00Push Zero onto the stack
  • [23] CODECOPYExecutes CODECOPY opcode by using 3 arguments from the top of the stack
  • [24] PUSH1 00 Push Zero to stack
  • [26] RETURNHalts execution and returns data from a specific portion of EVM’s memory

As you can clearly see, there are some new decimal values (like 383 or 40) in this portion of the opcodes and one completely strange opcode, i.e., CODECOPY.

Do you recall that one of those tasks was to return the runtime bytecode that the EVM stores on-chain?

Well, that’s precisely what’s happening in the above-mentioned instructions, i.e.,

  • Getting and Returning the Runtime portion of the bytecode
  • Storing this piece of bytecode as the runtime code on-chain.

Since all initial tasks of creation code are now done, it’s now ready to perform its final task which is to return the runtime part of the bytecode that can be further used to execute the smart contract.

Alright. Let’s get back to deciphering the mysterious opcodes between Instruction [1a] to Instruction[26].

  1. PUSH2 0x17f (383 in decimals) at Instruction [1a]
  • 0x17f (383) basically represents the length of the runtime bytecode.
  • This means that the runtime bytecode is 383 bytes long.
  • This instruction simply pushes the length of the bytecode, i.e., 383 bytes, onto the stack.

2. PUSH2 0x0028(40 in decimals) at Instruction [1e]

  • After the DUP1 opcode, which simply duplicates the top of the stack, i.e., 17f, we move to the next opcode which is PUSH 2 0x0028.
  • 0x0028(40) represents the offset in the contract code from where we can start copying the runtime bytecode.
  • In simpler terms, the entire bytecode is basically creation code + runtime code.
  • Therefore, in our Test contract, the first 40 bytes are part of the creation code. The runtime code starts after that.
  • This instruction basically pushes the offset, i.e., the specific location from where the runtime bytecode starts.

✍️

Note: Offset simply means the specific location of a piece of data with respect to another location. Read more here

3. Push 00 onto the stack at Instruction[21]

  • This basically pushes the destination offset in the Memory.

Well, now comes the fun part.

Why do we put all these hex values onto the stack?

We put all these values onto the stack for a very specific opcode, i.e., the CODECOPY opcode.

4. What is CODECOPY at Instruction [23]?

  • A one-liner definition of the CODECOPY opcode is that it’s responsible to copy code from the currently running environment to memory.
  • However, in order to do that, this opcode requires 3 arguments(information):

▶️ Number of bytes of code to copy,
▶️ Offset(location) in the bytecode from where it should start to copy,
▶️ Destination/Target memory position where it should copy the code to.

Now, all of it starts to make sense, doesn’t it?

a. PUSH2 0x17f instruction provides the number of bytes to copy, i.e., 383

b. PUSH2 0x0028 instruction provides offset in contract code from where to start copying the runtime bytecode, i.e., 40.

c. PUSH2 00 simply symbolizes the destination offset in the memory to which the runtime code should be copied.

As soon as the CODECOPY opcode gets executed, it stores the runtime bytecode in the memory as expected.

See the image below. That’s the runtime bytecode for our test contract. 👇

So, in a nutshell, between Instruction 1a to 26, the EVM does the following:

  • Copies 383 bytes of code starting the offset 40 in the bytecode and copying them at the offset 0 in the memory.
  • And last but not least, Instruction [24] pushes 00 onto the stack. This represents the starting offset in memory where the runtime bytecode is stored.
  • And finally Instruction[26], i.e., RETURN opcode returns the entire 363 bytes from memory, which eventually is the runtime bytecode that is stored on-chain for further execution of the smart contract code & and its functions.

That’s it.

You just witnessed the entire journey of how the to init code is executed by the EVM which is a procedure that almost every smart contract goes through after being deployed.

Let’s Create a Mind Map

In order to get a good mental model of the entire procedure, here is a quick flowchart to visualize the tasks performed by the init code part of the bytecode. 👇

Wrapping it up

That brings us to the end of the 2nd part of the EVM series.

As per the title, the motto of this article was to take you on a journey of smart contracts from plain and readable solidity code to complex bytecode and all imperative EVM actions that happen in between.

You should now have a very clear idea of:

  • An eagle-eye glance at a smart contract’s life cycle,
  • ABIs and Bytecodes,
  • Difference between Creation and Runtime bytecode,
  • Basics of Opocdes and EVM stack,
  • Free Memory Pointer and its significance,
  • How constructors are executed,
  • The Non-Payable Check in constructors,
  • How the state variables are initialized in a constructor, etc

Prepare yourself for the next part of this EVM series. Cheers, Stay Tuned.

Subscribe to Decipher with Zaryab 👇

--

--

Zaryab Afser
Coinmonks

Lead Smart Contract Engineer @ Push Protocol| Smart Contract Security Auditor | Educating the World about Web3, Smart Contracts & Security in DeFi