Understanding The Ethereum Yellow Paper
After reading the Ethereum Yellow Paper (YP) many times, just like most people I really didn’t understand it. Here’s my humble attempt at understanding it, hope you find it helpful.
“So, I have borrowed notes and diagrams from those that have come before me, and hopefully, I leave a trail for those that will come after me.”
(Disclaimer: This post is based on the current version of the Yellow Paper, PETERSBURG VERSION fa00ff1 – 2021-05-12).
Introduction
In this section, the authors highlight Ethereum as being a state machine. In computer science, a state machine refers to something that will read a series of inputs and based on those inputs will transition to a new state (i.e computers). Hence why Ethereum is explained as a new decentralized computer, that anyone can participate in.
The authors also wrote about the driving factors behind Ethereum. One key goal of Ethereum is to facilitate transactions between consenting individuals who would otherwise have no means to trust each other. Also, the previous work done by the ecosystem.
This perhaps is the easiest to understand of the Yellow Paper (please don’t get too comfortable!).
The Blockchain Paradigm
Ethereum as a whole can be viewed as a transaction-based state machine: which begins with a genesis state and incrementally executes transactions to morph it into some current state.
Formulaically, this can be denoted as:
σt+1 ≡Υ(σt, T) where ;
σt = is the current world state
T = is the Transaction
Υ = is the Ethereum state transition function
This will result in σt+1, called the next world state
Basically, it can be seen as a state transition machine, where Transaction ‘T’ is the arc between the current state and the next world state.
Transactions are collated into blocks: blocks are chained together using a cryptographic hash as a means of reference. These blocks act as a journal, recording a series of transactions together with previous blocks and an identifier for the final state.
B = is a Block containing (…, (T0, T1, …), …) a series of transaction
Transactions are punctuated with incentives for nodes to mine. This incentivization takes place as a state-transition function, adding value to a nominated account (i.e the miner)
Miners engage in mining, which is the process of bolstering one series of transactions( a block) over any other potential competitor block through a cryptographically secure proof known as proof-of-work (POW). This can be formally expanded to ;
B ≡ (..., (T0, T1, ...), ...)
Π(σ, B) ≡ Ω(B, Υ(Υ(σ, T0), T1)…)
Where ;
Ω = is the block finalization state transition function (a function that rewards a nominated party)
B = is the block containing a series of transactions amongst some other components.
Π = is the block level state-transition function.
This is the basis that forms the blockchain paradigm, a model that is the backbone of not only Ethereum but all decentralized consensus-based transaction systems to date.
Value: The network has designed a way of incentivizing computation and transmitting value in its own currency, called ETH. This value can be broken down into units, with Wei being the smallest and Ether being the highest.
Which History?
There is a history of transactions on the block, from the root (the genesis block) to leaf (the block containing the most recent transaction) through a tree structure known as the blockchain.
The blockchain must maintain a single source of truth that everyone must accept, otherwise, no one will trust the system, and defeats its purpose. There’s a canonical chain that everyone accepts as the main chain, the correct one, which contains the same or compatible transactions.
Having multiple states (or chains, path) must be avoided, as it becomes almost impossible to determine which is the correct, valid one. In a scenario where there’s disagreement as to which is the correct chain, a fork occurs. We typically want to avoid forks, as they disrupt the system.
When a fork happens, In order to generate consensus as to which is the canonical block (i.e correct or genuine block), we use a scheme called GHOST Protocol, which stands for Greedy Heaviest Observed Subtree. It says we must pick the path that has had the most computation done upon it.
Conventions
There are a number of typographical conventions for the formal notation used in the Yellow Paper (YP), some of which are ;
World state: σ
Machine state: μ
Ethereum state transition function: Υ
Cost function: C such as the cost function CSSTORE for the SSTORE storage operation.
Keccak hash function (sometimes SHA-3, for the purpose of this paper): is a versatile cryptographic function used for authentication, encryption, and pseudo-random number generation. Before Keccak 256 hashes of data structures can be calculated, they must be converted to bit strings using Recursive Length Prefix (RLP) encoding.
Tuples: is a finite ordered sequential list of elements. Usually denoting Tn as a sequential list of n elements, where n is a non-negative integer. There can also be an empty tuple denoted as 0-tuple.
Tn is also used to denote Transaction nonce i.e list of transactions.
δ The number of items required on the stack for a given operation.
Scalars and fixed-size byte: are assumed to be non-negative integers N. The set of all byte sequences is B. If such a set of sequences is restricted to those of a particular length, it is denoted with a subscript (i.e B32 denoting a set of byte sequences of length 32).
Blocks, State, and Transactions
World state (state) is a mapping between addresses (160-bit identifiers) and account states (a data structure serialized as RLP i.e Recursive Length Prefix).
It is assumed that the implementation will maintain this mapping in a modified Merkle Patricia tree.
A Merkle Patricia Trie (otherwise called binary hash tree) is one of the key data structures for Ethereum’s storage layer. Essentially it is a key-value mapping. It allows us to verify data integrity.
One can compute the Merkle Root Hash of the trie with the Hash function, such that if any key-value was updated, the Merkle root hash of the trie would be different. The hashes of the bottom row are referred to as “leaves”, the intermediate hashes as “branches”, and the hash at the top as the “root”.
Hashes 0–0 and 0–1 are the hash values of data blocks L1 and L2, respectively, and hash 0 is the hash of the concatenation of hashes 0–0 and 0–1.
Account state
There are two types of accounts; externally owned accounts (i.e wallets) and contract accounts.
Externally owned accounts (EOAs) can send messages to other EOAs or to a contract account (CA) to execute a transaction/code using its private key. Messages sent between two EOAs are simply a value transfer (i.e sending Ether to each other). But a message sent from an EOA to a contract account activates the code within the CA (such as, to create new contracts, mint tokens, transfer tokens, etc).
Also, contract accounts can’t initiate transactions on their own, it has to be initiated by an EOAs. Instead, CA fires transactions initiated by the EOAs.
There is the account state σ[a], which comprise of the following four fields;
- nonce: In the case of an externally owned account, this lists the number of transactions sent from the originating address, or in the case of accounts with associated code(i.e a contract account), the number of contract-creations made by this account.
- balance: The number of Wei owned by this address.
- storageRoot: Hash of the root node of the account storage trie. This is empty by default.
- codeHash: The hash of the EVM code of this account, for contract accounts. For EOAs, this will be an empty string.
Transaction is a single-cryptographically signed instruction constructed by an actor externally to the scope of Ethereum, an actor assumed to be human but using software tools in the construction and dissemination of the instruction. In Ethereum, remember we said, Transactions are what make the state change from the current state to the next state.
There are two types of transactions: those that result in message calls, and those which result in the creation of new accounts with associated code (otherwise known as contract creation).
Block
The block in Ethereum is a collection of relevant pieces of information which includes the block header and the block body (i.e information about the set of transactions included in the block, and a set of other block headers for the current block’s ommers).
Ommer is a block whose parent is equal to the current block’s parent’s parent.
When mining, there are many miners trying to mine the same set of transactions at the same time. Since the block mining time is very short (~15 sec. in the case of Ethereum), there is a possibility that more than one block is mined within a very short interval. The block mined first is added to the main chain, but the effort of the miner who mined the other block is not let off. These other blocks are called “orphaned blocks”.
The purpose of ommers is to help reward miners for including these orphaned blocks in the main chain. The ommers that miners include must be “valid,” meaning within the sixth generation or smaller of the present block. After six children, stale orphaned blocks can no longer be referenced.
Miners receive smaller rewards for ommer blocks than full blocks.
A block header consist of the following:
- parentHash: a hash of a parent block’s header.
- ommersHash: a hash of the current block’s list of ommers.
- beneficiary: the account address that will get the fees for mining this block.
- stateRoot: the hash of the root node of the state trie, after transactions have been executed and finalized.
- transactionsRoot: hash of the root node of the trie, that contains all transactions listed in the block
- receiptsRoot: Whenever a transaction is executed, a transaction receipt is generated. This is the hash of the root node of the transaction receipt trie.
- logsBloom: a bloom filter (a data structure) that consists of log information generated on transactions in the block.
- difficulty: the difficulty level of this block. This is a measure of how hard it was to mine the block. The difficulty level is always changing based on the time it took to mine the previous block.
- number: number of ancestor blocks. Starting from zero, with the genesis block and incrementing by one with each subsequent block.
- gasLimit: the current limit of gas expenditure per block
- gasUsed: sum of the gas used in transactions in the block.
- timestamp: the unix timestamp of this block’s inception
- extraData: this is extra data related to this block. When a miner is creating the block, it can choose to add anything in this field.
- mixHash: a hash which, combined with the nonce proved that a sufficient amount of computation has been carried out on this block. It is used to verify that a block has been mined properly
- nonce: a hash, when combined with mixhash proves that a sufficient amount of computation has been carried out on this block.
Every block header contains three trie structures for:
- state (stateRoot), which is state trie: it is where all information about accounts are stored and you can retrieve information by querying it.
- transactions (transactionsRoot), which is transaction trie: it records transactions in Ethereum
- receipts (receiptsRoot), which is transaction receipt trie: it records receipts(outcome) of transactions. The receipt is a result of the transaction which is executed successfully.
Holistic Validity: We can assert a block’s validity if and only if it satisfies several conditions: it must be internally consistent with the ommer and transaction block hashes and the given transactions. Hr of Parent, Hr, Ho, Ht, He, Hb.
Gas and Payment
All programmable computation in Ethereum is subject to fees (denominated in gas). Gas is essential to the Ethereum network. It is the fuel that allows it to operate, in the same way, a car needs gasoline to run.
In executing an operation, the cost in gas is a computational representation of performing the operation (measured in time) and the amount of permanent storage required by that operation (when writing to the storage).
gasLimit: Every transaction has a specific amount of gas associated with it. It is the maximum amount of gas that the sender is willing to pay to execute the transaction.
gasPrice: This is the value that the transaction sender is willing to pay per unit of gas.
Out of gas: This occurs when there is no necessary gas to run the transaction.
Mining: Mining is the process of creating a block of transactions to be added to the Ethereum blockchain. Mining is an expensive process, so if Miners didn’t get anything in return for mining, no one would do it. Therefore, a miner receives fees from all transactions included in the block. Miners tend to set and advertise their minimum fee, so are at the liberty of rejecting mining blocks that don’t meet their minimum fee.
Transaction Execution
The execution of a transaction is the most complex part of the Ethereum protocol: it defines the state function, Y. It is assumed that any transaction executed first passes the initial tests of intrinsic validity. These include;
- The transaction is a well-formed RLP, with no additional trailing bytes.
- The transaction signature is valid.
- The transaction nonce is valid (equivalent to the sender account’s current nonce).
- The gas limit is no smaller than the intrinsic gas used by the transaction.
- The sender account balance contains at least the cost required in up-front gas payment.
There is also a rule that isn’t mentioned as part of the tests of intrinsic validity. It states that “the transaction must not be included in a block if, by including it, the total gas limit of all transactions in the block doesn’t exceed the block’s gas limit”.
This is a great walkthrough example by Preethi Kasirenddy.
- A predefined cost of gas for executing the transaction
- A gas fee for the data sent with the transaction (4 gas for every byte of data or code equals zero, and gas for every non-zero byte of data or code)
- If the transaction is a contract-creating transaction, an additional 32,000 gas.
- The sender account balance must contain at least the cost of the “upfront” gas used for execution. To calculate the upfront gas, the transaction’s gasPrice is multiplied by the transaction’s gasLimit to determine the maximum gas cost. This is then added to the total value being transferred from the sender to the recipient.
If the transaction meets all of the above requirements for validity, then we move onto the next step.
First, the upfront cost of execution is deducted from the sender’s balance and increases the nonce of the sender’s account by 1 to account for the current transaction. At this point, we can calculate the gas remaining as the total gas limit for the transaction minus the intrinsic gas used.
Next, the transaction starts executing. Throughout the execution of a transaction, Ethereum keeps track of the “substate.” This substate is a way to record information accrued during the transaction that will be acted upon immediately after the transaction completes. Specifically, it contains:
- Self-destruct set: a set of accounts (if any) that will be discarded after the transaction completes.
- Log series: archived and indexable checkpoints of the virtual machine’s code execution.
- Refund balance: the amount to be refunded to the sender account after the transaction. Storage in Ethereum costs money, so Ethereum refund / reward a sender for clearing up storage. Ethereum keeps track of this using a refund counter. The refund counter starts at zero and increments every time the contract deletes something in storage.
Next, the various computations required by the transaction are processed.
Once all the steps required by the transaction have been processed and assuming there is no invalid state, the state is finalized by determining the amount of unused gas to be refunded to the sender. In addition to the unused gas, the sender is also refunded some allowance from the “refund balance” that was described above.
Once the sender is refunded.
- The Ether for the gas is given to the miner.
- The gas used by the transaction is added to the block gas counter (which keeps track of the total gas used by all transactions in the block, and is useful when validating a block).
- All accounts in self-destruct set (if any) are deleted.
Finally, we’re left with the new state and a set of logs created by the transaction.
Contract Creation
Remember we have two types of accounts, EOAs and CAs. The purpose of a transaction in a contract account is to create new contracts. There are a number of intrinsic parameters used when creating an account.
- sender
- original transactor
- available gas
- gas price
- endowment
- arbitrary length byte array
- the initialisation EVM code
- the parent depth of the message-call/contract-creation stack
- the salt for the new account’s address and finally the permission to make modifications to the state.
Code execution depletes gas, and gas must not go below zero before execution is complete. If gas goes below zero before execution, Transaction will exit before the code has come to a natural halting state due to an out of gas situation(OOG). If the execution halts in an exceptional fashion (i.e due to an exhausted gas supply, stack underflow, invalid jump destination, or invalid instruction), then no gas is refunded to the caller(i.e sender), and the state is reverted to the point immediately prior to balance transfer.
In a case where all goes well and code executes (i.e contracts created), any remaining unused gas is refunded to the original sender of the transaction, and the altered state is allowed to persist.
Note that the intention is that the result is either a successfully created new contract with its endowment or no new contract with no transfer of value.
Message Call
For executing a message call, several parameters are required just like with contract creation with a few exceptions.
- sender
- transaction originator
- recipients
- the account whose code is to be executed, usually the same as the recipient
- available gas
- value and gas price together with an arbitrary length byte array, d
- the input data of the call
- the present depth of the message-call/contract creation stack
- the permission to make modifications to the state.
Execution Model
The execution model looks at how a transaction actually executes within the virtual machine (VM). it specifies how the system state is altered given a series of bytecode instructions and a small tuple of environment data. This is specified through a formal model of a virtual state machine, known as the Ethereum Virtual Machine (EVM).
Basics: The EVM is a simple stack-based architecture. The Ethereum Virtual Machine is the runtime environment for smart contracts in Ethereum. EVM code is executed on Ethereum Virtual Machine (EVM). The EVM is a Turing complete virtual machine, with a difference to other Turing complete machines, by being intrinsically bound by gas. i.e all computation done is limited to the amount of gas provided or available.
The word size of the machine (and thus the size of the stack items) is 256-bit. The memory model is a simple word-addressed byte array. The stack has a maximum size of 1024. Also, a storage model, which instead of a byte array, is a word array. Unlike memory, which is volatile, storage is non-volatile and is maintained as part of the system state. And both are well-defined initially as zero.
Like the out-of-gas exception, the machine does not leave state changes intact. Rather, the machine halts immediately and reports the issue to the execution agent (either the transaction processor or recursively, the spawning execution environment) which will deal with it separately.
Fees Overview: Fees (denominated in gas) are charged under three distinct circumstances, all three as prerequisites to the execution of an operation. The first and most common is the fee intrinsic to the computation of the operation. Secondly, gas may be deducted in order to form the payment for a subordinate message call or contract creation; this form part of the payment for CREATE, CREATE2, CALL, and CALL CODE. Finally, gas may be paid due to an increase in the usage of memory.
To incentivize minimization of the use of storage (which corresponds directly to a larger state database on all nodes), the execution fee for an operation that clears an entry in the storage is not only waived, a qualified refund is given; in fact, this refund is effectively paid up-front since the initial usage of a storage location costs substantially more than normal usage.
Block Finalisation
Block finalization can mean two different things; if the block is a new one or it is an existing block. If it is a new block, it means the process required for mining the block. If it is an existing block, it means the process of validating the block.
The process of finalizing a block involves four stages:
- Validate, (or, if mining, determine) ommers;
BH & PH are the blocks and the parent block of the corresponding header H respectively (i.e each ommer block must be valid and be within the sixth generation of the present block).
- Validate (or if mining, determine) transactions;
The given gasUsed must correspond faithfully to the transactions listed. The total gas used in the block must be equal to the accumulated gas used according to the final transaction.
- Apply rewards;
The application of rewards to a block involves raising the balance of the accounts of the beneficiary address of the block and each ommer by a certain amount. Also for each ommer, the current block’s beneficiary is awarded an additional 1/32 of the current block reward.
- Verify (or if mining, compute a valid) state and block nonce;
Block Transition function which maps an incomplete block B to a complete Block B1. Make sure that all transactions and resultant state changes are applied, and define the new block as the state after the block reward has been applied.
Mining Proof-of-Work: mining (PoW) exists as a cryptographically secure nonce that proves beyond reasonable doubt that a particular amount of computation has been expended in the determination of some token value n. Since Mining new blocks come with an attached reward, the proof-of-work not only functions as a method of securing confidence that the blockchain will remain canonical into the future but also as a wealth distribution mechanism.
For both reasons, there are two important goals of the proof-of-work function;
- Firstly, it should be as accessible as possible to as many people as possible. Participation is open to everyone, the use of specialized and uncommon hardware should be minimized.
- Secondly, it should not be possible for a single party (i.e miner) to make super-linear profits.
The mining PoW serves both as a security mechanism and as a wealth distribution mechanism.
NOTE: Ethereum is transitioning from the Proof-of-Work(PoW) consensus mechanism to the Proof-of-Stake (PoS) consensus. PoS is a different topic entirely that might be explored in a future post.
In the Yellow Paper, there are also topics such as Implementing Contracts & Future Directions of the Ethereum ecosystem. This will be an easy-breezy read.
Wow! It has been a thrill learning and writing about the Ethereum Yellow Paper (YP). If it takes you multiple reads to fully understand what is going on, I totally get it. It has taken me multiple reads, speaking to experienced people, looking into the codebase and I still find myself going back to the YP to double-check something.
Anyway, I hope you find this helpful. If you find errors and mistakes (you’ll most likely do), please do not hesitate to write to me & me, and hopefully, we get to correct them.
Thank you, and see you soon!
https://ethereum.github.io/yellowpaper/paper.pdf
https://ethereum.org/en/whitepaper/
https://www.lucassaldanha.com/
https://takenobu-hs.github.io/downloads/ethereum_evm_illustrated.pdf
https://medium.com/codechain/modified-merkle-patricia-trie-how-ethereum-saves-a-state-e6d7555078dd
https://ethereum.org/en/developers/docs/gas/
https://www.preethikasireddy.com/post/how-does-ethereum-work-anyway