Ethereum Yellow Paper Deep Dive

Antony Silvetti-Schmitt
Illini Blockchain
Published in
57 min readApr 9, 2023

Table of Contents

  1. Blockchain Paradigm
  2. Conventions
  3. Blocks, States, and Transactions
  4. Gas and Payment
  5. Transaction Execution
  6. Contract Creation
  7. Message Call
  8. Execution Model
  9. Blocktree to Blockchain
  10. Block Finalization
  11. Implementing Contracts
  12. Future Directions

Blockchain Paradigm

As a whole, Ethereum can be considered as a giant transaction based state machine. Each state includes information on account balances, trust arrangements, and any data that can be represented by a computer. But before we dive deeper into this paradigm, we first need to understand what a state machine is.

State Machines

A state machine is a model which can be used to formally describe any system or algorithm. A state transitions from one to another based on its input. To start off, let us consider a state machine with a finite number of states, or a finite-state machine.

Finite-state Machines

A finite-state machine can be represented by 5 key components. These being

  • Σ: An arbitrary finite set representing the possible inputs.
  • Q: An arbitrary finite set containing each state of the machine
  • δ: An arbitrary transition function which produces the output state depending on the current state and input.
  • s: The start state which is a subset of Q.
  • A: The set of accepting states which is a subset of Q.

The best way to understand what this all really means is through an example. As we mentioned before, state machines are ways to formally describe any system or algorithm. So lets consider an algorithm that takes in a string composed of only 1’s and 0’s, and only accepts it if it contains at least one 1.

To start off with, we can define the set of possible inputs of our state machine as being Σ = {0, 1} since our string can only be composed of 1’s and 0’s. Additionally, due to the simplicity of the algorithm and alphabet, we only need two states for this state machine: a start state, s, and a terminating state, t. Thus,

Finally, our state transition function can be described as such: we stay on the start state as long as we read a 0 and move to the accepting state as soon as we read a 1. Once we reach the accepting state, we don’t care what the rest of the inputs are since we’ve already satisfied the accepting condition of our initial algorithm. Therefore, the transition function can be defined as such:

A visual representation of the state machine can be seen below.

So to recap, we’ve just formally defined a state machine that takes in a stream of characters which are either 0 or 1, and only accepts the stream if there is at least one 1. Otherwise, it forever stays at the non accepting s state.

State Machines in Ethereum

Now that we have a rough idea of how finite-state machines work, lets look at Ethereum as a state machine. Two key aspects of Ethereum can be viewed as state machines: transactions themselves can be viewed as a transition between states, and mining (proof-of-work).

Transactions as state machines

Lets first look at transactions as a state machine. Like with finite-state machines, we first need to define a couple key components:

  • γ: The Ethereum state transition function which allows for arbitrary computation
  • σ: Stores the arbitrary state between transactions
  • T: A transaction

A valid state transaction can be represented by the following:

At a high level, the next state of the Ethereum state machine is determined by its current state with an arbitrary computation performed on it based on the input transaction.

Mining as a state machine

Now lets look at mining as a state machine. For this, we need to define a few more key components on top of what we’ve previously defined. These can be seen below:

  • B: Block of transactions
  • Π: Block-level state transition function. The new state given the block’s reward function applied to the final transactions resultant state.
  • Ω: Block’s reward function
  • σ: The resultant state of the final transaction of a block

The Ethereum yellow paper provides us with the basis of the blockchain paradigm through the following proof:

Let’s start from the top. The top formula, shows how the next state in the proof-of-work state machine is represented by applying the block-level state transition function to the current state given a block to be added.

From here, we can see that each block can be represented by a series of transactions. This is represented by second formula.

Finally, the last formula shows what exactly the Block-level state transition function does. The function begins by using the Ethereum state transition function in a recursive manner to process each transaction present in a block based on the current state. From here, the resulting state is used as an input for the Block’s reward function. The result of this represents the next state in the proof-of-work state machine.

Value

In order to incentivise people to contribute to the network as a miner, they must be paid in some capacity for using their computational resources. To this end, each network uses a form of currency to pay miners. For Ethereum, the currency is Ether. Much like how dollars have fractional representations in the form of cents, Ether also has fractional representations in the form of Wei, Szabo and Finney. Due to the large value of Ether, payments are usually performed in Wei, the smallest subdenomination of Ether. The table below represents the proportional value of each subdenomination to each other.

History

Because the network is decentralized, multiple nodes can create new blocks to chain onto an older block. As a result, blocks must be stored as a tree with one agreed upon path representing the actual blockchain. If nodes aren’t able to reach a consensus as to which block to add is valid, it would result in a fork in the path where multiple states in the Ethereum state machine exist simultaneously. This would cause serious issues as the true state of each account would be impossible to determine.

The method of consensus used to determine the correct path can be changed. When nodes don’t agree on a protocol change, it results in a permanent fork. To distinguish between the two diverged blockchains, a chain ID β is used.

Conventions

Earlier, you probably saw symbols used to describe different aspects of Ethereum, and likewise, throughout the remainder of this article you will see various different symbols in use, the purpose of this section is to equip you with the knowledge of the expressions and symbols used in this article in order to understand the content. We encourage you to refer back to this section if you have any confusions on different expressions and symbols used in this article!

Firstly, there are two state values denoted by bolded lowercase Greek letters.

  • World State: σ
  • Machine State: µ

Next, the Ethereum State Transition function is denoted by an uppercase Greek Letter

  • Ethereum State Transition Function: ϒ

Other than these symbols, usually an uppercase letter is used to define a function. Take the general cost function for example, denoted by an uppercase C.

There will be instances in which these functions will be subscripted to offer more clarity on different variants of the function in use, as shown below this is the general cost function for the sstore operation:

For any externally defined functions, they are typewritten and abbreviated. For example, the Keccak-256 hash function is denoted KEC and the Keccak-512 has function is denoted by KEC512.

Tuples are denoted using an uppercase letter, for example take the denotation T, which is used to represent an Ethereum Transaction. These tuples can be subscripted to show a specific element in the tuple. For example, T->n denotes the nonce of the transaction.

Scalars and fixed-size byte sequences (also known as arrays) are denoted using a lowercase letter. n is used to describe the nonce in this yellow paper. Any elements of this type that have a particular special meaning could be a lowercase Greek letter. Take the example from the yellow paper below:

Sequences of arbitrary length are typically denoted using a bolded lowercase letter. For example, o is used to denote the byte sequence from the output data of a message call. If there are any particularly important values of this type, then an uppercase bolded letter may be used. For example, O.

Next, there are some important things to remember while reading further. Throughout the paper, we assume that scalars are non-negative integers and thus belong to the set N, denoting Natural Numbers. The set of all byte sequences is denoted by B, the formal definition for which can be found in Appendix B of the yellow paper. In the case that the set of all byte sequences is restricted to a certain byte length, it will be subscripted. See below:

Set of all byte sequences of length 32

Additionally, the set of all non-negative integers smaller than ²²⁵⁶ is represented as below:

This is formally defined in section 4.3 of the Ethereum Yellow Paper

The bracket operator [] will be used to reference individual elements or subsequences of a larger sequence. Below is an example for referring to a single element:

Denotes the first item on the machine’s stack.

When looking at subsequences, we use ellipses to represent the ranges. See below:

Denotes the first 32 items of the machine’s memory.

For example, take the world state σ, which is a sequence of accounts. The bracket operator would be used to refer to an individual account.

There is also conventions on variants of existing values, the paper explains it as such:

Describes the way paper refers to variants of values.

For example, say x is the original value.

  • x
  • x’ is the modified and utilizable value.
  • x is an intermediate value.*
  • x is an intermediate value.**

Here is a common function that is used in the yellow paper:

Returns the last item in a given sequence x.

Blocks, States, and Transactions

Introduction

With introductions, history, and conventions out of the way, it is time to dive into the technical specification of Ethereum. Blocks, States, and Transactions are the building blocks of any blockchain, not just Ethereum. In this section, we dive into how Ethereum implements these concepts. Other blockchains may implement them differently, but the general idea is the same. A fundamental understanding of these concepts is necessary to understand the more complex portions of the protocol.

World State

The first, and arguably most fundamental concept of Ethereum is the World State, denoted by σ. Earlier, the concept was mentioned in the context of the transition function. The world state is simply a database that stores the current state of the blockchain (How that is actually represented in Ethereum will be discussed in further detail below). A world state can be generated by taking some gensis state and applying a series of valid transitions (in Ethereum’s case, transactions) to it to obtain the world state. The whole point of a blockchain is to come to consensus on the current world state, and how it was reached.

Ok, so we have covered what world state is, but how it is represented, and how is it stored?

Ethereum represents its world state as a mapping between addresses (20 byte identifier derived from the account’s public key) and account states (discussed in “Account States”).

So how is this mapping stored? There are currently ~225 Million Unique Addresses on the Ethereum Blockchain¹. Storing all those account states on the blockchain would be incredibly storage intensive and inefficient as any party wishing to verify the Ethereum Blockchain would have to store a copy of the entire mapping. Some nodes simply want to participate in verifying transactions, without having to worry about storing the entire state (called “Light Nodes”). So, we need a way to quickly verify the world state without storing the entire thing.

If some sort of hashing solution jumps out at you, you’d be right! The solution is that the world state is not actually stored on the blockhain, rather, it is assumed full implementations of the Ethereum protocol will store this mapping in a modified Patricia Merkle Tree. Only the root node of this tree is actually stored on the blockchain, which is essentially a hash of all account states. Since this root Node is dependent on all of the account states in the leaf nodes, it can serve an identifer for unique account states. World state looks something like this:

Simplified diagram showing how world state is stored. See the section on Merkle Trees for more detail.

So why not just use a hash of the list of key, value pairs instead of this fancy Patricia Merkle Tree? It is because the Patrica Merkle Tree has a really nice property that “allows us to verify the inclusion of a key-value pair without the access to the entire key-value pairs”². Suppose a light node would like to verify the returned account balance for itself. A Patrica Merkle Tree can generate a proof using the Merkle Root Hash, an account address, and an account balance. This means that a light node can simply ask for the merkle root hash and use its own account address to verify its own balance (a node should must know its own address and balance). If the tree was updated/tampered with since the root hash was generated, even if the updates were to other accounts, the verification would fail. This is a powerful concept because it now allows a node to verify the correctness of the entire World State only needing to know its own balance, address, and given a merkle root hash.

The Merkle Tree

Modified Patricia Merkle Trees are powerful and appear all throughout Ethereum, so let’s dig a little into how they actually work. For the purpose of this paper we will only cover how Merkle Trees work. Modified Patricia Merkle Trees³ are an efficient extension of Merkle Trees that Ethereum uses, which is out of the scope of this paper. See references to learn more. Understanding the concept of Merkle Trees is enough to understanding the rest of the yellow paper.

The Merkle Tree is a complete tree in which leaf is a hash of some data block. Every non leaf node is simply a hash of its children. The top hash is called the merkle tree root. Here is a good visualization:

While this picture is an oversimplification, it does a good job at demonstrating the concept. It should be apparent that altering the value of any of the data blocks will change the merkle tree root. Essentially the merkle tree root acts as a digital fingerprint for a given world state. This makes it easy to verify that one world state is the same as another without access to all the individual account states in O(1) time.

So, it is easy to compare one state with another. Once again, simple hashing of all the account states would give us the same power though. So why the Merkle Tree? Like mentioned earlier, the Merkle Tree’s advantage is that it can prove inclusion of a specific account and account balance. Why might this be useful?

Suppose a light node and would like to participate in verifying transactions, but doesn’t want to store the world state. In order to verify transactions it needs to know the pertinent account balances. To get these it must query a full node. But how does it know if it can trust the balances returned by the full node?

This is why the merkle tree is so powerful! All the light node needs is to know for certain one account address and its account balance. Using this information, and the merkle root hash from the readily available from the current consensus merkle hash, the light node can verify that the known account is included with the correct balance. More importantly, proving this fact, also proves that the entire tree is trustworthy. You see, even altering other account states will cause the verfication of the known account to fail. This means that an light node only needs three pieces of information, the merkle root, a known account address, and a known account balance, to ascertain whether it can trust the world state returned by a full node.

This property is essential to Ethereum’s security because it allows anyone to verify the current world state without intensive storage costs.

Account State

So world state is simply a mapping between addresses and account states, but what is an account state?

An account state, denoted by σ[a] is comprised of the following 4 fields:

  • nonce: A scalar value equal to the number of transactions sent from this account, or for accounts with associated code (smart contracts), the number of contraction-creations made by this account. Denoted by σ[a]ₙ
  • balance: A scalar value equal the number of Wei owned by this address. Denoted by σ[a]_b
  • storageRoot: The merkle tree root hash of the storage contents of the account. This is where all an accounts storage is kept (for instance smart contracts maintaining some balances mapping). Hashed into the merkle tree the same as in world state. Empty by default. Denoted by σ[a]ₛ
  • codeHash: The KEC hash of the EVM code of this account. This is the code that is executed if the address receives a message call. Only smart contract accounts have associated code. Denoted by σ[a]_c
A good diagram that gives an overview of what comprises account state

In this section the paper begins to delve into some math. Quite frankly most of it isn’t necessary for understanding and can be desribed less formally in words.

For an account to be valid, it must either be empty, or have an account address that is part of the Β₂₀(set of all possible 20 byte sequences). Additionally, the account state must be valid, meaning that the nonce and balance fall into the set of Ν₂₅₆ and the storageRoot and codeHash must be a part of Β₃₂.

There are two types of accounts in Ethereum: Externally Owned Accounts, and Contract Accounts. Externally Owned Accounts, otherwise known as simple accounts, are accounts managed by outside actors (such as humans). If the codeHash field is empty, in other words σ[a]_c = KEC(()), then the account is a simple account. If it is not empty then it is a contract account, which are the accounts which store smart contract state and code.

An account is empty if it has no code, zero nonce, and zero balance. An account is called dead when its account state is non-existent or empty.

The Transaction

Transactions are the method by which state changes occur in ethereum, or more generally on any blockchain. A transaction, denoted as T is “a single crpytographically signed instsruction constructed by an actor externally to the scope of ethereum”⁴. These actors are assumed to be human, although software tools are often used in their construction (Metamask for instance abstracts away most of the work in constructing a transaction).

As of the Berlin version, there are two transaction types: 0 (legacy) and 1 (EIP-2930). For the purposes of this summary, we will focus on on the new EIP-2930 transaction, although they are very similar.

There are two subtypes of transactions: those which result in a message calls (how contract methods are called, and eth sent between non contracts), and those which result in the creation of new accounts with associate code. Essentially you can think of it as transactions which create smart contracts, and those which don’t.

The (type 1) transaction is comprised of the following fields:

  • transaction type: 0 or 1. T
  • nonce: Scalar value equal to the number of transactions sent by the sender. T
  • gasPrice: A scalar value equal to the amount of Wei to pay per unit gas for coputational costs incurred by the network as result of this transaction. T
  • gasLimit: A scalar value equal to the max amount of gas that should be used in executing this transaction. Paid up-front, before computation is done and can not be increased later. T_g
  • to: 20 byte address of the messageCall recipient. For contract creation $\empty$ is used. T
  • value: A scalar value equal to the number of Wei to be transferred to the message call recipient, or in the case of contract creation, to the newly created account. T
  • r,s: Values corresponding to the signature used to sign the transaction. They can be used to determine the sender of the transaction. T , T_r.
  • accessList: List of addresses and storage keys that the transaction plans to access formally Tₐ. Specifying this before the transaction is executed allows for more secure and more efficient storage access⁵.
  • ChainId: Must equal network Chain ID the transaction is sent on. T_c
  • yparity: signature Y Parity. Used in combination with r,s values to determine the sender.

If the transaction is a contract creation it contains:

  • Init: A byte array of unlimited size specifying the EVM code for the account initialization procedure (contract creation). Init returns the body, which is the fragment of code to be executed whever the account recieves a message call. Init is only called once at creation and discarded afterwards.

On the other hand, if the transaction is a message call, it contains:

  • data: A byte array of unlimited size specifying the input data of the message call. Can be used to call smart contract methods for example by passing their funcion signature and input parameters.

The Block

The block in Ethereum, and more generally in most blockchain is what stores the history of transitions between states. At a high level a block generally must hold 2 types of information:

  1. Information about the previous block’s state, which confirms that this block is a continuation of the previous chain. This enforces security because it means a bad actor can not alter previous blocks without it being immidietely apparent. This information is generally stored in the block header. Formally denoted as H.
  2. An ordered list of transcations that specify the changes to world state that will occur in this block. Formally denoted as T.

Let’s dive into specifics.

The Header

  • parentHash: The Keccak 256 bit hash of the parent block’s header. This is where the “chain” aspect of blockchain comes from. Every block includes a references to its predecssor. This ensures that earlier blocks can not be tampered with without altering the current block.
  • ommersHash: The Keccak 256 bit hash of the ommers list portion of this block. See appendix⁶ for more detail.
  • beneficiary: The adddress to which all fees from successful mining of this block are to be transferred. This is the incentive for proof of work. If you are the first to guess the right nonce, you get to set this field and send yourself the reward. Might have been updated with P.O.S.
  • stateRoot: The Keccack 256 bit hash of the root node of the state merkle trie after all transactions in the block have been executed and changes applied. See World State and Merkle Tree section above if you don’t understand.
  • transactionsRoot: The Keccak 256 bit hash of the root node of the merkle tree structure populated with each transaction in the transaction list portion of the block. Since the block header only stores the hash of the previous block header, it is important that the block header contains information which identifies the transactions contained in that block.
  • recieptsRoot: The Keccack 256 bit hash of the merkle tree structutucre populated with the reciepts of each of the transcations in the tranaction list. Another identifier for the transaction information in the blockheader.
  • logsBloom: “The bloom filter composed from indexable information (logger addresses and log topics) contained in each log entry from the reciept of each transaction in the transactions list”¹
  • difficulty: A scalar value corresponding to the difficulty level of this block (the difficulty “setting” for finding the right nonce). Goes up over time in Proof of Stake to keep up with advances in processing power and increased miner activity.
  • number: A scalar value equal to the number of ancestors of this block. The genesis block for example would have zero ancestors.
  • gasLimit: A scalar value equal to the current limit of gas expenditure per block. To stop a denial of service attack on the network by running computationally expensive transactions.
  • gasUsed: A scalar value equal to the total gas used by all the transactions in this block.
  • timeStamp: A scalr value equal to the unix time at this block’s inception
  • extraData: A byte array that contains any extra data relevant to this block to be set by the miner. Must be less than 32 bytes.
  • mixHash: A 256 bit hash which when combined with the nonce proves the computational work that was carried out to mine this block. Might be gone with POS.
  • nonce: A 64 bit value which combined with the mix hash proves the computation work carried out to mine this block. Might be gone with POS.

There are a set of constraints on each of these fields that determine a block’s validity. Please see the yellow paper for more detail¹.

Transcation List, Ommers

In addition to the block header each block contains a list of transactions to be included in the block. This determines which transactions will occur within this block, and ultimately how world state will be mutated in this block. Additionally, a list of Ommers, also known as “uncle blocks” is also contained in the block. Ommers are simply a set of other block headers that are known to have the same parent as the present block’s parent. Ommers are Ethereum’s way of dealing with “orphaned blocks”⁶.

So overall we can refer to block:

The block header, transaction list, and Ommer’s list

Gas and Payment

All computation in Ethereum is subject to fees. Fees are specified in units of gas which vary based on the composed instructions being executed. For example, a transfer transaction requires 21000 units of gas, whereas a SELFDESTRUCT operation costs 5000 gas. To calculate the amount of ETH that is consumed, simply multiply the gas cost by the base gas fee, which is 10⁹ wei. One ETH contains 10⁹ gwei.

Transactions have specific gas values associated with them. The gasLimit is an amount corresponding to the maximum amount of gas the sender is willing to pay. The actual purchase price is referred to as the gasPrice. Finally, the user can specify a tip to the validator, referred to as the priority fee. The total amount the user pays in gwei can be represented by this formula:

total = gasLimit * (gasPrice + priority fee)

The priority fee was added in August 2021 as part of the London Upgrade. Before this upgrade, the entire gas fee was delivered to the beneficiary address, which is the address under the control of a miner. After the upgrade, the base fee is actually burned, and the priority fee is what goes to the miner.

Suppose the gasLimit is set to 50000 and the network fee is only 30000. The EVM will take 50000 from the sender, use 30000, and return the remaining 20000 to the sender. However, if the gasLimit is set to a value below the actual gas needed, all the gas will be used and the transaction will end up failing. This is because computational resources have already been utilized and reverting them does not return the fee. Transactors may also set any gasPrice and priority fee, but miners can also ignore transactions that do not pay enough. Setting a higher priority fee may allow a transaction to execute faster, as miners will prioritize them.

Transaction Execution

Introduction

Transaction execution is one of the fundamental components of the Ethereum network. Transactions are the means by which users interact with the network, and they are executed by the Ethereum Virtual Machine (EVM) in a trustless, decentralized manner. In this section, we will discuss the transaction execution process in Ethereum as described in the Ethereum Yellow Paper, including the important components of a transaction, the gas mechanism, and the steps involved in transaction execution.

Gas Mechanism

Gas is the unit used to measure the computational cost of executing a transaction in Ethereum. The gas mechanism is designed to prevent infinite loops, denial of service attacks, and other types of malicious behavior that could harm the network. The gas mechanism is implemented by setting a gas limit for each transaction, which specifies the maximum amount of gas that the sender is willing to use for the transaction.

The gas limit is multiplied by the gas price to determine the total cost of executing the transaction. This equation is as follows:

The gas price is the amount of ether that the sender is willing to pay for each unit of gas used in the transaction. The total cost of executing the transaction is deducted from the sender’s account balance, and the gas is used to pay the miners who execute the transaction.

If the gas limit is reached before the transaction is complete, the transaction is reverted, and any changes made to the state of the network are rolled back. This ensures that the network remains in a consistent state and prevents any malicious behavior from damaging the network.

Transaction Execution Process

The transaction execution process in Ethereum involves several steps. These steps are described below:

  1. Nonce Validation: Before a transaction is executed, the EVM checks the nonce of the sender’s account. If the nonce of the sender’s account is less than the nonce of the transaction, the transaction is rejected.
  2. Gas Estimation: The EVM estimates the amount of gas required to execute the transaction. Gas is used to measure the computational cost of executing a transaction. The EVM calculates the gas required based on the complexity of the transaction and the amount of data being processed.
  3. Gas Payment: The sender of the transaction pays for the gas required to execute the transaction. The gas payment is deducted from the sender’s account balance.
  4. Contract Invocation: If the transaction is a contract invocation, the EVM invokes the contract and passes the input data to the contract. The contract executes the input data and updates the state of the network.
  5. Gas Refund: If the execution of the transaction results in any unused gas, the remaining gas is refunded to the sender of the transaction.
  6. State Transition: The EVM updates the state of the network based on the result of the transaction execution. If the transaction results in any changes to the state of the network, the changes are added to the state trie, which is a data structure used to store the current state of the Ethereum network.
  7. Gas Calculation: The EVM calculates the total gas used by the transaction based on the gas consumed during execution and any gas refunds.
  8. Transaction Receipt: After the transaction is executed, the EVM generates a transaction receipt. The transaction receipt contains information about the gas used, the status of the transaction, and the output data of the transaction.

The steps involved in transaction execution are described in more detail below.

Nonce Validation

Before a transaction is executed, the EVM checks the nonce of the sender’s account. The nonce is a sequence number that is associated with each account and is incremented for each transaction sent from that account. If the nonce of the sender’s account is less than the nonce of the transaction, the transaction is rejected. This prevents replay attacks, where an attacker attempts to resend a transaction that has already been executed.

Gas Estimation

The EVM estimates the amount of gas required to execute the transaction. Gas is used to measure the computational cost of executing a transaction. The EVM calculates the gas required based on the complexity of the transaction and the amount of data being processed. Gas is consumed by the EVM when it executes instructions, writes to the state trie, or performs other operations that require computational resources.

Gas Payment

The sender of the transaction pays for the gas required to execute the transaction. The gas payment is deducted from the sender’s account balance. The total cost of executing the transaction is calculated by multiplying the gas limit by the gas price, as denoted by the equation above. If the sender does not have sufficient funds to pay for the gas, the transaction is rejected.

Contract Invocation

If the transaction is a contract invocation, the EVM invokes the contract and passes the input data to the contract. The contract executes the input data and updates the state of the network. The contract execution is performed by the EVM, which executes the bytecode of the contract one instruction at a time. Each instruction can read or write to the contract storage, memory or the stack. The execution of each instruction consumes a certain amount of gas.

Gas Refund

If the execution of the transaction results in any unused gas, the remaining gas is refunded to the sender of the transaction. This encourages efficient use of gas and ensures that the network remains in a consistent state.

State Transition

The EVM updates the state of the network based on the result of the transaction execution. If the transaction results in any changes to the state of the network, the changes are added to the state trie, which is a data structure used to store the current state of the Ethereum network. The state trie is a Merkle tree that stores the key-value pairs representing the current state of the network. Each node in the trie represents a hash of its child nodes, which ensures the integrity and consistency of the network.

Gas Calculation

The EVM calculates the total gas used by the transaction based on the gas consumed during execution and any gas refunds. The total gas used is subtracted from the gas limit, and if the gas limit is reached before the transaction is complete, the transaction is reverted, and any changes made to the state of the network are rolled back.

The equation is as follows:

The first component represents the cost of executing the transaction data. Tᵢ represents the indices of the zero-byte data segments, T_d represents the indices of the non-zero data segments, and Gtxdatazero and Gtxdatanonzero represent the gas costs for zero-byte and non-zero data segments, respectively. This component calculates the gas cost for executing the transaction data by summing the gas costs for all non-zero data segments and the gas cost for each zero-byte data segment.

The second component represents the cost of contract creation, and it is equal to Gtxcreate if the transaction creates a new contract (i.e., Tₜ is empty), and 0 otherwise.

The third component, Gtransaction, represents the base gas cost for executing a transaction. This cost is fixed and does not depend on the specifics of the transaction being executed.

The fourth and final component represents the cost of accessing the account state and storage during the transaction. T_A represents the set of accounts accessed during the transaction, and ||T_A|| is the number of accounts accessed. The term Gaccesslistaddress represents the gas cost for accessing an account address, and ||T_A[j]ₛ|| Gaccessliststorage represents the gas cost for accessing the storage state of account T_A[j]. This component calculates the gas cost for accessing the account state and storage by summing the gas costs for all account addresses and storage states accessed during the transaction.

Overall, this equation represents the total gas cost of executing a transaction on the Ethereum network. It takes into account the costs of executing the transaction data, creating new contracts, accessing the account state and storage, and the base gas cost for executing a transaction.

Transaction Receipt

After the transaction is executed, the EVM generates a transaction receipt. The transaction receipt contains information about the gas used, the status of the transaction, and the output data of the transaction. The transaction receipt is stored in the blockchain and can be accessed by anyone who has access to the network. The transaction receipt is important because it provides a record of the transaction and allows users to track the status of their transactions.

The transaction receipt contains the following information:

  • Transaction Hash: The hash of the transaction.
  • Gas Used: The amount of gas used by the transaction.
  • Contract Address: If the transaction creates a new contract, the address of the new contract.
  • Logs: A log is a record of an event that occurred during the transaction. Logs can be used to track the state of the network and to trigger actions in other contracts.
  • Status: The status of the transaction. If the transaction was successful, the status is 1. If the transaction failed, the status is 0.

The Bottom Line

The transaction execution process is a fundamental component of the Ethereum network. Transactions are the means by which users interact with the network, and they are executed by the Ethereum Virtual Machine in a trustless, decentralized manner. The gas mechanism is used to prevent malicious behavior and to ensure that the network remains in a consistent state. The transaction execution process involves several steps, including nonce validation, gas estimation, gas payment, contract invocation, gas refund, state transition, gas calculation, and transaction receipt generation. The Ethereum Yellow Paper provides a detailed description of the transaction execution process and the technical specifications of the Ethereum Virtual Machine.

Contract Creation

Introduction

Smart contracts are self-executing agreements that run on the Ethereum blockchain and are designed to automate the process of verifying, executing, and enforcing the terms of a contract. In this section, we will explore the contract creation process in Ethereum in detail, as described in the Ethereum Yellow Paper. We will discuss the components involved in creating a contract, the bytecode of a contract, and the steps involved in deploying a contract to the Ethereum network.

Components of a Contract

A contract in Ethereum is defined as a collection of code (bytecode) and data that resides at a specific address on the Ethereum blockchain. A contract has the following components:

  1. Code: This is the bytecode that defines the functionality of the contract. It is written in a low-level programming language called EVM bytecode, which is executed by the Ethereum Virtual Machine (EVM). The EVM bytecode is a binary format that is not human-readable.
  2. Data: This is the storage space used by the contract to store information. The data is stored in a key-value store called the contract storage. Each contract has its own storage space, which is initialized to an empty state when the contract is created.
  3. Address: This is a unique identifier that is assigned to the contract when it is created. The address is generated using the account nonce and the sender’s address.
  4. Balance: This is the amount of ether held by the contract. The balance is initially set to zero when the contract is created.

Bytecode of a Contract

The bytecode of a contract is a low-level programming language that is executed by the EVM. The EVM bytecode is a binary format that is not human-readable. It is generated by compiling the contract code written in a high-level programming language such as Solidity or Vyper. The bytecode is a sequence of instructions that are executed by the EVM.

The EVM bytecode is made up of opcodes and operands. Opcodes are the instructions that the EVM executes, and operands are the data that the opcodes operate on. The EVM supports a set of opcodes that can be used to perform various operations such as arithmetic, bitwise operations, memory operations, stack operations, and control flow operations.

The bytecode of a contract is stored in the blockchain as a transaction input data. When a contract is deployed to the Ethereum network, a transaction is created that contains the bytecode of the contract in the input data field.

Contract Creation Process

The process of creating a contract in Ethereum involves the following steps:

  1. Contract Deployment: To create a contract, a deployment transaction is created and broadcast to the Ethereum network. The deployment transaction contains the bytecode of the contract in the input data field.
  2. Contract Address Generation: When the deployment transaction is processed by the Ethereum network, a new contract is created and assigned a unique address. The address is generated using the sender’s address and the nonce of the sender’s account.
  3. Contract Initialization: After the contract address is generated, the EVM creates an instance of the contract code and initializes the contract storage. The contract storage is a key-value store that is used to store the contract data.
  4. Contract Execution: Once the contract is initialized, it can be executed by sending transactions to the contract address. The contract executes the bytecode in response to the transactions and updates its storage accordingly.

Let us discuss these steps in detail.

Contract Deployment

To deploy a contract to the Ethereum network, a deployment transaction is created that contains the bytecode of the contract in the input data field. The deployment transaction is a special type of transaction that creates a new contract on the Ethereum blockchain. It contains the following information:

  • Nonce: The nonce is a counter that is used to prevent replay attacks. It is incremented for each transaction sent by an account.
  • Gas Limit: The gas limit is the maximum amount of gas that can be consumed by the contract deployment transaction. Gas is the unit used to measure the computational cost of executing a transaction.
  • Gas Price: The gas price is the amount of ether that the sender is willing to pay for each unit of gas used in the contract deployment transaction.
  • Value: The value is the amount of ether sent along with the contract deployment transaction. This value is added to the contract’s balance.
  • Input Data: The input data is the bytecode of the contract that is being deployed.

When the deployment transaction is broadcast to the Ethereum network, it is processed by the miners. The miners validate the transaction and execute it on their nodes. The miners compete with each other to add the transaction to the blockchain by solving a cryptographic puzzle called Proof of Work. Once a miner solves the puzzle, the transaction is added to the blockchain and becomes part of the immutable ledger.

Contract Address Generation

When the deployment transaction is processed by the miners, a new contract is created and assigned a unique address. The address is generated using the sender’s address and the nonce of the sender’s account. The address is a 160-bit hexadecimal number and is represented in the Ethereum network as a hexadecimal string with the prefix “0x”. The address is unique and cannot be changed once it is generated.

The contract address is generated using the following formula:

where:

  • KEC stands for Keccak-256, which is a cryptographic hash function that is used to generate a 256-bit hash value. It is used to generate a deterministic hash of the input data.
  • RLP stands for Recursive Length Prefix. It is an encoding scheme used to encode complex data structures such as lists and trees. The sender’s address and nonce are encoded using RLP.
  • The B variable notation indicates that only the last 20 bytes (160 bits) of the hash value are used to generate the contract address.
  • The s variable stands for the sender’s address.
  • The n variable stands for the nonce.

Contract Initialization

After the contract address is generated, the EVM creates an instance of the contract code and initializes the contract storage. The contract storage is a key-value store that is used to store the contract data. The contract storage is initialized to an empty state when the contract is created.

The contract initialization process involves the following steps:

  • Contract Creation: The EVM creates a new instance of the contract code and assigns it to the contract address.
  • Memory Allocation: The EVM allocates memory for the contract and initializes it with the contract bytecode.
  • Stack Initialization: The EVM initializes the stack and sets the program counter to zero.
  • Contract Storage Initialization: The EVM initializes the contract storage to an empty state.

Contract Execution

Once the contract is initialized, it can be executed by sending transactions to the contract address. The contract executes the bytecode in response to the transactions and updates its storage accordingly.

The contract execution process involves the following steps:

  • Transaction Verification: When a transaction is sent to the contract address, the EVM verifies the transaction by checking its validity and authenticity.
  • Gas Calculation: The EVM calculates the gas required to execute the transaction based on the complexity of the contract code and the amount of data being processed.
  • Gas Payment: The sender of the transaction pays for the gas required to execute the transaction. The gas payment is deducted from the sender’s account balance.
  • Execution: The EVM executes the bytecode of the contract in response to the transaction. The bytecode is executed one instruction at a time. Each instruction can read or write to the contract storage, memory or the stack. The execution of each instruction consumes a certain amount of gas.
  • Contract Storage Update: When the bytecode reads or writes to the contract storage, the changes are persisted to the blockchain. The contract storage is a key-value store that is used to store the contract data. Each key-value pair represents a piece of data stored by the contract. When the contract storage is updated, the changes are added to the state trie, which is a data structure used to store the current state of the Ethereum network.
  • Gas Refund: If the execution of the contract results in any unused gas, the remaining gas is refunded to the sender of the transaction.
  • Transaction Receipt: After the contract execution is complete, the EVM generates a transaction receipt. The transaction receipt contains information about the gas used, the contract address, and the output data of the transaction. The output data is the result of the contract execution.

Contract Creation Equations

The variable σ represents the current state of the network, and σ∗ represents the updated state of the network after the contract creation transaction is executed. The variable v represents the amount of ether sent with the transaction, and v’ represents the amount of ether used during contract creation.

The first component of the equation, σ*[a] = (1, v + v’, TRIE(∅), KEC(())), sets the account being created (denoted by a) to a new account with a balance of v + v’ ether, an empty storage trie, and an empty code hash.

The second component of the equation, σ*[s] = (∅ if σ[s] = ∅ ∧ v = 0, a* otherwise), updates the state of the storage trie for the contract being created (denoted by s).

  • The equation first checks whether the contract being created has a zero balance and no previous storage entries. If the contract does not exist (i.e., σ[s] = ∅) and no ether is sent with the transaction (i.e., v = 0), then the storage trie for the contract is set to an empty set (denoted by ∅) using the notation (∅ if σ[s] = ∅ ∧ v = 0). This ensures that the contract does not have any existing storage entries that could affect its behavior.

Otherwise, the storage trie for the contract is set to a new storage state third component denoted by a∗, where a∗ is defined as (σ[s]ₙ, σ[s]_b − v, σ[s]ₛ, σ[s]_c). This ensures that the contract retains its existing storage entries while taking into account the new balance of the contract after the contract creation transaction is executed. The following components of the a* tuple are are initialized correctly in order to be ready to execute and interact with other contracts:

  • The nonce for the contract (denoted by σ[s]ₙ).
  • The balance for the contract (denoted by σ[s]_b − v), which is equal to the previous balance minus the amount of ether sent with the transaction.
  • The storage state for the contract (denoted by σ[s]ₛ).
  • The code hash for the contract (denoted by σ[s]_c).

Equation (105)

This equation represents the gas that is left over after the contract is created. The variable F represents the condition that must be satisfied for the contract creation to be successful.

If the condition F is true, then the equation sets the leftover gas g to 0, which means that all of the gas provided in the transaction was used during contract creation. This can happen if the input bytecode is invalid or if there is not enough gas provided in the transaction to cover the gas cost of contract creation.

If the condition F is false, then the equation sets the leftover gas g to the difference between the gas provided in the transaction and the amount of gas used during contract creation. This represents the gas that is left over and can be used by the contract during its execution.

Equation (106)

This equation represents the updated state of the network after the contract creation transaction is executed.

If the condition F is true or if the updated state of the network is empty (i.e., there were no changes to the state during the contract creation process), then the updated state of the network σ’ is set to be the same as the current state σ.

If the condition F is false and the updated state of the network is not empty, then the updated state of the network σ’ is set to be the same as the updated state σ**, except that the storage for the new account is set to an empty set if the account is dead. Otherwise, the storage for the new account is set to the KEC hash of the output data o generated by the contract creation transaction.

Equation (107)

This equation represents the updated account balance after a contract is created.

If the condition F is true or if the updated state of the network is empty, then the updated account balance A is set to be the same as the current account balance A*.

If the condition F is false and the updated state of the network is not empty, then the updated account balance A is set to be the same as the updated account balance A**, which is the sum of the current account balance and the amount of ether sent with the contract creation transaction.

Equation (108)

This equation represents the success indicator for the contract creation process. The variable z is the success indicator, which indicates whether the contract creation was successful or not.

If the condition F is true or if the updated state of the network is empty, then the success indicator z is set to 0. This means that the contract creation was not successful and that the contract was not deployed to the Ethereum network.

If the condition F is false and the updated state of the network is not empty, then the success indicator z is set to 1. This means that the contract creation was successful and that the contract was deployed to the Ethereum network.

Equation (109)

This equation represents the overall conditions that must be met for the contract creation process to be successful.

The variable F is a condition that checks whether the contract creation transaction is valid. It has two components:

  • The first component (σ[a] =/= ∅) checks whether the account being created (denoted by a) already exists in the current state (denoted by σ) of the Ethereum network. If the account does not exist, then the contract creation transaction is invalid.
  • The second component checks whether the contract code for the account being created is valid. There are two possible cases:
  • If the contract code (denoted by σ[a]_c) is not an empty byte array or KEC hash (i.e., the code for the account is non-empty), then the contract creation transaction is valid.
  • If the contract code is an empty byte array or KEC hash, then the nonce for the account (denoted by σ[a]ₙ) must be non-zero. This ensures that the account has been previously used to send a transaction or create a contract, and that the account is not a newly created account with an empty balance.

The second component of the condition F checks additional conditions for the contract creation process to be successful. There are three possible cases:

  • If the updated state of the network (denoted by σ**) is empty and the output data generated by the contract creation transaction (denoted by o) is an empty byte array, then the contract creation transaction is invalid.
  • If the gas provided in the transaction that creates the contract (denoted by g**) is less than the amount of gas used during contract creation (denoted by c), then the contract creation transaction is invalid.
  • If the size of the output data generated by the contract creation transaction (denoted by ||o||) is greater than 24576 bytes, then the contract creation transaction is invalid.

Overall, the equation represents the conditions that must be met for the contract creation process to be successful. It ensures that the contract creation transaction is valid, the contract code is non-empty or the nonce is non-zero, and that the output data, gas limit, and contract code size are within the limits set by the Ethereum network.

Benefits of Contract Creation in Ethereum

The contract creation process in Ethereum has several benefits. Some of the benefits are as follows:

  • Decentralized: The contract creation process is decentralized, which means that anyone can create a contract without the need for a central authority. This makes the Ethereum network more secure and less prone to censorship or control by a central authority.
  • Immutable: Once a contract is created, it cannot be modified or deleted. This ensures the integrity and transparency of the contract.
  • Trustless: The contract execution process is trustless, which means that it does not require trust between the parties involved. The contract is executed automatically based on predefined rules, which reduces the need for intermediaries and increases the efficiency of the process.
  • Programmable: Contracts in Ethereum are programmable, which means that they can be customized to suit specific use cases. Smart contracts can be designed to automate complex business processes, such as supply chain management, insurance, and voting.

The Bottom Line

The contract creation process in Ethereum is an essential feature that enables the creation of smart contracts and decentralized applications. The process involves the deployment of a contract to the Ethereum network, the generation of a unique contract address, the initialization of the contract storage, and the execution of the contract bytecode. The process is decentralized, immutable, trustless, and programmable, which makes it a powerful tool for building innovative applications on the Ethereum blockchain. The Ethereum Yellow Paper provides a detailed description of the contract creation process and the technical specifications of the Ethereum Virtual Machine.

Message Call

Executing a Message

In order to execute a message several parameters are needed. These parameters are best outlined by the Ethereum Yellow Paper, where it states the parameters are the “sender (s), transaction originator (o), recipient (r), the account whose code is to be executed (c, usually the same as recipient), available gas (g), value (v) and gas price (p) together with an arbitrary length byte array, d, the input data of the call, the present depth of the message-call/contract-creation stack (e) and finally the permission to make modifications to the state (w). Now, lets get into the specifics of each parameter. Additionally, message calls can return the output data denoted by the byte array (o).

The Sender (s)

The sender is the address of the user or external account initiating the message. It’s necessary to determine who is responsible for paying the gas required to execute the message call and if they have enough in their account to pay the aforementioned fees.

The Transaction Originator (o)

The transaction originator is the address of the user or external account that signs a transaction and broadcasts it to the Ethereum network. This could be different from the sender because the sender may have been another smart contract that was called by a previous message call, instead of an external account. The originator determines who is responsible for paying the gas fees associated with the message call.

The Recipient (r)

The recipient represents the address of the recipient of the message call. This is either a 20-byte address for an external account or a 32-byte address for a contract.

The Executed Code Account Address (©)

The executed code account address is important to determine what should happen when a message call is executed. This address is usually an address to a smart contract.

Available Gas (g)

The parameter specifies the maximum amount of gas that can be used for executing the message call. It helps prevent infinite loops and other malicious behavior by limiting the amount of computational resources that a message call can consume.

The Value (v)

This parameter specifies the amount of ether that should be transferred with the message call. This is essential during messages which facilitate transactions between users.

The Gas Price (p)

The gas price specifies the price of gas in ether that the sender is willing to pay for each unit of gas used in the message call. This is used to find the total cost of a message call and incentivizes to process a transaction.

Input Data (d)

This parameter is an arbitrary length byte array that contains the input data for the message call. This data includes any parameters necessary for smart contract execution during message calls.

Present depth of the message-call/contract-creation stack (e)

Each smart contract has its own call stack. Everytime a new message or contract is created, it is added to the stack. This helps prevent malicious or unintended behavior arising from stack overflow errors.

Permission to make modifications to the state (w)

This parameter specifies if a message call can change the state of the Ethereum state machine. If the permission is granted (w=1), the message call can modify the state variables of the contract or transfer ether between accounts. If the permission is not granted (w=0), the message call is read-only.

Contract Output (o)

The message call can produce an output that represents the result of the smart contract execution. This output may include a return value, an error message, or other data that is specific to the contract being executed. Its returned to the sender of the call, who can use it to determine the success or failure of the contract execution. The output data is placed in the 6th element of the stack.

Message Execution as a State Machine

A message is executed in the EVM through the following formula:

vbar represents the value apparent in the execution context. This is essentially the amount of ether that is available to the smart contract during its execution. Its determined by the sum of the amount of Ether transferred to the smart contract and the value of gas refunds if available. This is used in the DELEGATECALL instruction which is an instruction used to call a function in another smart contract while accessing the storage and balance of the current contract.

In a message that transfers funds, the resulting state of the sender and recipient accounts can be represented by the following formula:

In the event the message execution halts due to an exception (ex. not enough gas), the state is reverted to the state prior to the balance transfer and no gas is refunded. There are nine exceptions to general execution which are tied to pre-compiled smart contracts.

Execution Model

https://cdn-images-1.medium.com/max/1200/0*vwWLr-nLqYJahP5s.jpg

We know that the states are changed in Ethereum, but there is a formal technical specification for the way it all works. In this section we will dive into the specifics of how it works all the way down to its core addressing technical nuances and details along the way.

The Ethereum Virtual Machine is a software environment that executes given bytecode instructions, and then will update the state of the Ethereum blockchain accordingly. In this section, we will refer to the Ethereum Virtual Machine using its abbreviated form: EVM. This is designed to be a virtual state machine, meaning that it is specified by a formal model which describes the intricacies of how the system state is changed as a result of executing smart contract code.

The EVM is quasi-Turing complete. To understand this let us look at the definition of a Turing-Complete machine. It is defined as a theoretical machine that can execute any algorithm or program expressed as a sequence of logical operations, provided there is enough memory and computational resources to cater to these operations. The reason that the EVM is quasi-Turing complete is because it can execute operations in the same way that a Turing complete machine could, except it has a limitation. Gas. Gas as defined earlier is a unit of computational cost in Ethereum that is used to pay for the execution of each instruction, and the EVM deducts gas for each instruction that is executed. If an execution were to run out of gas before it completes, the transaction is reverted, and any state changes are thrown out. This mechanism in part can help prevent Denial-of-Service (DoS) attacks due to limiting the amount of computational resources that can be used by a single transaction or contract.

The EVM is also deterministic. This means that it will produce the same output when given the same input, and this is important so that the blockchain can be trusted and that it operates predictably.

Basics

We will now dive into the technical aspects of the EVM but first, some basic information to know.

  • The EVM is designed to work with 256-bit data, which is the same size as the hash function that Ethereum uses: KECCAK-256
  • The EVM has a memory model that works like a simple lit of byes, and a stack with a maximum size of 1024 items. It also has a storage model that works like a list of words. Note this is different from memory because storage is non-volatile and is maintained as a part of the system state.
  • All Locations in Storage and memory are defined initially as zero.

The EVM does not follow the standard architecture used by most computers, but rather stores program code in a separate virtual read-only memory, which can be accessed through special instructions. Why is this important?

By isolating program code in a virtual read-only memory, it prevents other parts of the system from accidentally or intentionally modifying the code, which in turn reduces the risk of bugs, vulnerabilities, and other issues that could compromise the integrity of the system. The use of special instructions to access it, ensures that only authorized entities may interact with the code and as such helps to prevent malicious behavior that could compromise Ethereum.

The EVM is also capable of halting execution and report any errors that it encounters, for example, running out of gas or the execution of invalid instructions, etc. These state changes are discarded and handled separately by the execution agent (transaction processor, or the spawning execution environment)

Fees

In the Ethereum network, fees are charged for certain actions in the form of gas, which we have defined earlier to be the measurement for the computational effort required to execute an operation. There are in specific, three different circumstances in which gas fees are charged.

  1. Gas is paid intrinsic to the computation of the operation itself.
  2. Gas is deducted to pay for subordinate message calls or contract creation.
  3. Gas is paid because of an increase in memory usage.

The total fee for memory usage is proportional to the smallest multiple of 32 bytes required to cover all memory indices used in the execution, as such increases in memory usage will result in more fees.

Storage fees are also charged to incentive minimizing the use of storage, however, if a contract clears an entry in its storage, the fee for that operation is waived and in fact refunded to the user for the initial usage of that storage location. The Ethereum network does this to further encourage its users to minimize the amount of storage they use, which in turn improves the quality and performance of the network as a whole.

Execution Environment

There are several pieces of important information used in the execution environment:

  1. The system state σ
  2. The remaining gas for computation g
  3. The accrued substate A

The accrued substate A refers to the portion of the system state that has been modified during the execution of a smart contract up to a certain point. It contains information like the current account balances, the current storage state, and other relevant data that has been changed during contract execution.

4. The tuple I, containing information provided by the execution agent as shown below:

The tuple I

This tuple contains relevant information that will be used in the EVM for execution, such as account addresses, byte instructions, etc.

The execution model defines a function Ξ, which is shown below:

https://cdn-images-1.medium.com/max/1200/1*rdJfpOsjvVk9R8_8-tvApA.png
As seen at (134) the expression on the left is congruent to the output of the function defined with the inputs of the state, the remaining gas, the accrued substate, and the tuple I as listed above.

Execution Overview

We must now define the function Ξ. In practical implementations, this function is modeled as an iterative process that progresses through a pair of the full system state, and the machine state, but for the purposes of explaining the yellow paper, we will look at it as a recursive function. The full system state includes all data stored on the Ethereum blockchain, whereas the machine state includes the data specific to the smart contract that is being executed.

We define Ξ recursively using a function called X, which uses an iterator function called O.

The O function defines the result of a single cycle of the state machine, which is executed by the EVM. This function also uses two other functions called Z and H.

The Z function determines if the current state of the machine is in an exceptional halting state which means that the execution cannot proceed and finish execution.

The H function specifies the output data of the instruction if and only if the present state is a normal halting state of the machine (not exceptional state). The empty sequence, denoted by (), is not equal to the empty set which is denoted by Ø. This is important when we interpret the output of the H function, this function evaluates to Ø when the execution is meant to continue, but it returns a series (possibly empty) when execution should halt.

We will now dive into these inner functions used first, to better understand the full specification.

But first, to do that, we need to examine the machine state.

The machine state µ is defined as the tuple (g, pc, m, i, s) which are defined as follows:

  • g = gas available
  • pc = program counter (keeps track of next instruction to be executed)
  • m = memory contents
  • i = current memory in use by executing code
  • s = contents on the stack

The memory contents µ→m is a series of zeroes with the size of ²²⁵⁶.

Now for the purposes of defining the Z, H, and O functions, we define the current operation to be executed as w as shown below:

https://cdn-images-1.medium.com/max/1200/1*KOjA1iOz6pqzgqB0R8Qy9w.png

The Exceptional Halting Function (Z)

The Z function is defined as follows:

https://cdn-images-1.medium.com/max/1200/1*YLZ2ELteYByE3ByWVgEhKA.png

The execution is considered to be in an exceptional halting state if any of the following conditions are met.

  1. Insufficient Gas: If there is not enough gas left to complete the execution of the current instruction.
  2. Invalid Instruction: If an invalid instruction is encountered during smart contract execution.
  3. Insufficient Stack Terms: If there are not enough items on the stack to complete the execution of the current instruction.
  4. Invalid JUMP/JUMPI Destination: If the destination of a JUMP or JUMPI instruction is invalid (JUMP destinations will be covered in the following subsection)
  5. Stack Size Limit Exceeded: If the size of the stack would exceed 1024 items after executing the current instruction.
  6. State Modification During Static Call: If a smart contract attempts to modify the state during a static call. (Static calls do not allow any modifications to the state whereas regular calls do allow it)

These are the only circumstances(minus the special case) under which the execution of a smart contract can halt exceptionally

However, there is another condition that can cause an exceptional halt during contract execution. This occurs when a contract tries to execute an SSTORE instruction (instruction used to store a value at a specific location in the contracts storage) and the remaining gas available for execution is less than or equal to the call stipend.

The call stipend refers to the amount of gas that is reserved for a contract when it invokes another contract using a message call. This stipend is intended to cover the cost of executing the called contract and returning control to the calling contract. If the remaining gas available for execution is less than or equal to the call stipend when the SSTORE instruction is executed, we have an exceptional halt.

JUMP Destination Validity

To understand JUMP Destination Validity, we first must understand what JUMP instructions are.

JUMP instructions are used to transfer program execution flow to a different location in the code, specified by a destination address. JUMPI is similar to JUMP, the only difference being that it jumps if the topmost stack value is nonzero. When a JUMP instruction is encountered, the top value on the stack is popped and used as the destination.

JUMP instructions are important in Ethereum because they allow the program to transfer control to another location in the code, based on a condition. This allows for the creation of loops in code, conditional statements, more complex structures, etc. Additionally, JUMP instructions are used to implement function calls in Ethereum smart contracts. By jumping to a particular section of code that corresponds to a function, the EVM can execute that function and then return back to the original part of the program that it was executing before.

As seen above, the D function is used inside of the Z function to check if a JUMP instruction is to a valid destination. The D function is formally defined as follows:

https://cdn-images-1.medium.com/max/1200/1*6HLBmPyV8HuaNejRGFI1lQ.png

The D function checks if a JUMP instruction is valid or not the function has two conditions.

  1. The JUMPDEST instruction must be located on a valid instruction boundary, meaning that it must be located at a position in the code where instructions can be executed.
  2. The JUMPDEST instruction must be located within the explicitly defined portion of the code. It should not be located within the data portion of PUSH operations (PUSH operations are used to push a value onto the stack) or within the implicitly defined STOP operations (STOP operations are used to stop the execution of the code) that follow the explicitly defined portion of the code. This ensures no unexpected jumps or errors.

All of the conditions and rules for the Exceptional Halting Function ensure that the execution of smart contracts in Ethereum remain secure and deterministic, also so that no unexpected or potentially malicious behavior can undermine the integrity of the network.

The Normal Halting Function (H)

The Normal Halting Function H is defined as follows:

https://cdn-images-1.medium.com/max/1200/1*6FPPWGGKYFNjoNWBgk9hcg.png

In Ethereum, there are two data-returning halt operations called RETURN and REVERT, which can be used to return data from a function or revert the state of a contract to a previous state. These operations have a special function called H → Return which performs a hash on the returned data.

  • When the RETURN operation is executed, it takes an input value that represents the data to be returned to the calling function. The input value is pushed onto the stack and the RETURN instruction is executed, causing the EVM to halt execution and return the input value to the calling function.
  • When the REVERT operation is executed, it takes an input value that represents the reason for the revert and the state of the contract is reverted to the state it was prior to the current transaction execution. The input value is pushed onto the stack and the REVERT instruction is executed causing the EVM to halt execution and revert the state of the contract.

These instructions are important to the network because they play an important role in contract-contract communication. When a smart contract executes a RETURN or REVERT instruction, it can provide data as output to other contracts or external parties. This data can then be used to trigger other functions and actions on the Ethereum network. Overall, the ability of smart contracts to communicate with one another have played a huge role in the rise of decentralized applications and complex functionalities on the Ethereum Blockchain.

The Execution Cycle (O)

The O function is an iterative function that is used to calculate the total gas cost of executing a sequence of EVM operations, it is defined as follows:

https://cdn-images-1.medium.com/max/1200/1*Dr7F2gpocs4NC7oCvL-B6g.png

This function takes the opcode of the operation being executed, and the gas cost associated with that opcode. For example, the opcode for the ADD operation has a gas cost of 3, so O(ADD) = 3. It is also important to know how the stack is manipulated in this process. The opcode ADD instructs the EVM to pop the top two items off the stack, add them together, and push the result back onto the stack.

The O function is called iteratively for each opcode in the sequence of operations being executed. For example, if the sequence of operations includes an ADD followed by another ADD, then the O function would be called for both of these operations, and such the resultant gas cost would be the sum of the individual gas costs.

Each instruction has a gas cost associated with it, and the EVM deducts gas from the total gas available to the transaction. In addition, the program counter is incremented on each cycle for most instructions (the program counter points to the current instruction being executed in the program). Then after incrementing the program counter, the EVM moves onto the next instruction in the code and continues executing.

However, there are three instructions that are exceptions to this rule. For these instructions, the EVM assumes a special function J, which is defined as follows:

https://cdn-images-1.medium.com/max/1200/1*9kqA6kf_dNoUTMF4S9KlLA.png

The J function for the three exceptions

The three exceptions to the rule are:

  1. JUMP: The J function modifies the program counter to the destination of the JUMP instruction, not incrementing it the usual amount.
  2. JUMPI: As seen earlier, this instruction is very similar to JUMP, but it will only jump to the new location if the top item on the stack is nonzero. Like JUMP, the J function modified the program counter to the destination of the JUMPI instruction, not incrementing it the usual amount.
  3. STOP: This instruction stops the execution of the program code, the J function modifies the program counter to the current instruction, which effectively prevents any further execution of the program.

The J function ensures that the program counter is properly modified to the correct instruction so that the program execution is seamless.

In general, it is assumed that the memory, accrued substate, and system state do not change:

https://cdn-images-1.medium.com/max/1200/1*laPe5b5r_YJ7wj6q_fdnxA.png

However, it is important to note that instructions can still alter these components. For example, an instruction could modify the system state by transferring between accounts or changing the state of a smart contract.

Even though instructions have the capability to alter these components, the assumption that they remain constant is useful in understanding the execution model of Ethereum.

Putting it all together

Now that we have examined the inner workings of the EVM, let’s put it all together.

Here is a picture from the Yellow Paper showing the definition of Ξ, and a quick overview of all of its moving parts:

https://cdn-images-1.medium.com/max/1200/1*FPKg6kWmspapDCnH9Uforg.png

This is the definition of Ξ

The X function is the recursive definition of Ξ. This function is cycled until one of the following two conditions is met.

  1. The Z function becomes True indicating that the current state of the EVM is exceptional and that the machine must be halted. When this function becomes true, any changes made to the state during the execution of the smart contract are discarded.
  2. The H function returns a series, rather than the empty set, indicating that the machine has reached a controlled halt. This means that the execution of the smart contract has finished, and any changes made to the state during the execution of the smart contract are retained.

And finally, our O function increments through each of the instructions for our execution.

Blocktree to Blockchain

Because the network is decentralized, multiple nodes can create new blocks to chain onto an older block. As a result, blocks must be stored as a tree with one agreed upon path representing the actual Blockchain. Consensus over this path is determined by which path has the most work done on it, also known as the heaviest path. The longer the path, the more work done on it. The header alone can validate the computation done (see section 4, Blocks, State, and Transactions, for more details on the information the header contains).

The total difficulty, or total computation, of a block is recursively defined as the total difficulty of the part block added to the difficulty of the current block. This proof is formally outlined below:

Block Finalization

In the Ethereum blockchain, Block Finalization consists of four steps: validate ommers (also known as uncles), validate transactions, apply rewards, and verify state and block nonce.

Ommer Validation

Ommers are created when multiple valid blocks are submitted in quick succession. Since only one block can be selected to be added to the blockchain, one of these valid blocks is chosen and the rest are the ommer blocks. Validators (or miners) are rewarded for submitting Ommer blocks, though not the full block reward. After the Ethereum Proof of Stake merge, ommers are no longer possible since the PoS consensus protocol chooses which validator can propose a block rather than all miners competing to find and submit a block. Ommers are validated by checking the header for each by checking the contained hashes and values with the blockchain.

Example of a blockchain with a genesis block (yellow), multiple ommer blocks (blue), and normal blocks (black).

List of Ommers/Uncles$⁷$ (last one was the date of the merge 9/15/22)

More information on Ommers/Uncles⁶.

Transaction Validation

Transaction validation simply consists of comparing the block’s stated gas usage with the sum of the gas use of all individual transactions. If these two values match, the block’s transactions are valid.

Reward Distribution

The first step in reward distribution is to reward the block beneficiary. The block beneficiary is the address that submitted the selected block (also known as the block winner). The block beneficiary’s rewards are determined by the following equation.

The final balance of the block beneficiary is equal to the initial balance plus the block reward plus 1/32 of the block reward for each valid ommer block.

The second step in reward distribution is to reward the beneficiary of each ommer block. We will call these people ommer beneficiaries. Ommer beneficiaries are only rewarded if they submit a valid ommer block within a short period of time after the current block height.

In the above equation, the final balance of each ommer beneficiary is equal to the initial balance plus R, the reduced reward.

Rewards are determined depending on how many blocks back the ommer was submitted. For example, an ommer beneficiary is rewarded more for submitting an ommer for the most recent block than an ommer for 3 blocks ago. An ommer’s reward starts at ⅞ of the block reward and decays ⅛ of the block reward per additional block until it reaches a reward of zero, at which point the ommer is discarded.

The equation above shows how $R$, the reduced reward is determined by ⅛ the difference between the ommer height and current block height plus 1 times the base block reward. It is important to note here that the ommer height is always at least 1 less than the block height, so $R$ ranges from [0, ⅞].

The base block reward is expressed in Wei. ($10^{18}$ Wei = 1 ETH) The base block reward was altered in several previous hard forks.

State and Nonce Validation

Finally, the block’s state and nonce are validated by checking the hash of the root of the TRIE of the current state and a proof of work (PoW) respectively. Although PoW is no longer utilized after the Ethereum merge and transition to proof of stake (PoS), the yellow paper does not describe the specific implementation of the new PoS consensus. However, understanding the motivation behind the PoW system is important.

Motivation

According to the yellow paper, there are two important goals of the PoW function: ease of access and avoiding super-linear profits. Specifically, the algorithm should run on common hardware that many people already own, allowing for a very open distribution model around the world. Additionally, it should not have a high initial barrier or scale extremely well. The yellow paper states that “such a mechanism allows a well-funded adversary to gain a troublesome amount of the network’s total mining power and as such gives them a super-linear reward (thus skewing distribution in their favor) as well as reducing the network security.” Bitcoin’s SHA-256 based algorithm is particularly scalable and has given rise to dedicated mining hardware known as Application Specific Integrated Circuits (ASICs)

Ethash

To avoid a similar fate, Ethereum’s PoW algorithm is a custom-built variant of Dagger-Hashimoto called Ethash, which runs on a Graphics Processing Unit (GPU) and uses memory-intensive arithmetic along with a continuously growing generated dataset called a Data Analytic Graph (DAG) to make it extremely difficult (but not impossible) to make efficiency gains with dedicated hardware or ASICs. The algorithm’s DAG gets larger every epoch (30000 blocks) and would eventually grow larger than the memory of any dedicated hardware.

Bitcoin s19J Pro ASIC

Implementing Contracts

Some contracts are used to allow for specific, yet useful behavior. The first of these behaviors is a data feed. A data feed contract simply provides information from sources outside of Ethereum. A contract, when sent a message call, would always return some information. To update this information from the real world’s data, an external server would be run to create new transactions to the contract and update the value in the contract’s storage. Of course, when using data feeds for very sensitive tasks, trust must be placed into this external environment.

Random numbers are impossible to produce in a deterministic system such as the EVM. However, a pseudo-random number may be created by using information such as the block’s hash, timestamp, and other addresses. There is a potential for malicious miners to control values in some way, as they can view the hashes before submitting them to the EVM. As a result, miners may calculate blocks, but fail to upload them to the EVM until the hash produces their desired random number. For some purposes, a pseudo-random number is sufficient, but creating truly random numbers continues to be an issue for blockchains.

Future Directions

As the Ethereum blockchain grows, downloading or referencing sections of the blockchain will become increasingly cumbersome. One way to speed up this process is by saving several checkpoint nodes, possibly every 10,000 blocks, so a specific block can be found by starting at the closest checkpoint and traversing. These checkpoint nodes could be a part of creating a compressed archive of Ethereum, so that reading the chain or setting up validator nodes would be much easier. If the barrier to creating a node is lower (i.e. the storage cost is lower), more nodes can be deployed, leading to a more decentralized system.

Another direction Ethereum developers are keen to follow is increasing Ethereum’s scalability. Since Ethereum has a general transition function, it is difficult to optimize for each type of instruction and parallelize when applicable. One solution already in practice is to have separate blockchains (known as side chains) that interact with Ethereum but can execute operations much faster off-chain. As for the main Ethereum blockchain, minimizing wasted space and load from duplicate and invalid transactions is a priority by consolidating and removing certain transactions.

Authors

  • Sid Wanjara
  • Antony Silvetti-Schmitt
  • Varun Siva
  • Jacob Stolker
  • Ananya Sehgal
  • Aadi Mukherjee

References

  1. https://etherscan.io/chart/address
  2. https://medium.com/@chiqing/merkle-patricia-trie-explained-ae3ac6a7e123
  3. https://ethereum.org/en/developers/docs/data-structures-and-encoding/patricia-merkle-trie/
  4. https://ethereum.github.io/yellowpaper/paper.pdf
  5. https://eips.ethereum.org/EIPS/eip-2930
  6. https://www.investopedia.com/terms/u/uncle-block-cryptocurrency.asp
  7. https://etherscan.io/uncles

--

--