How Ethereum Works

An in-depth-but-not-too-in-depth technical overview of the Ethereum platform

Philip Shen
12 min readJun 14, 2018

Introduction

The intention of this article is to provide a technical overview of Ethereum: how it works, key concepts, and concerns over it. It is not an opinion piece and will not cover philosophical/social/legal issues surrounding the topic.

This article also assumes knowledge of basic blockchain principles and an idea of how Bitcoin works.

How it works

“What Ethereum intends to provide is a blockchain with a built-in fully fledged Turing-complete programming language that can be used to create “contracts” that can be used to encode arbitrary state transition functions, allowing users to create any of the systems described above, as well as many others that we have not yet imagined, simply by writing up the logic in a few lines of code”

Simply put: a decentralized computer.

That quote was taken from Ethereum’s white paper, and it’s a great place to start when learning about Ethereum. It summarizes the most important element of Ethereum: it’s Turing-completeness.

While scripting is possible in Bitcoin, it’s not Turing-complete; all scripts must finish executing within a finite period of time. This is because the creators wanted to be sure that denial-of-service (DoS) attacks, for example where an attacker takes down a node by making it run a script containing an infinite loop, would not be possible. What Ethereum did was provide an alternate solution to preventing DoS attacks while allowing Turing-completeness in programs on the blockchain, allowing the the blockchain to support, well, anything. Any computer program, from FizzBuzz to Ask Jeeves, can theoretically run on the Ethereum blockchain.

Let’s get into how it works.

Ethereum Virtual Machine

The Ethereum Virtual Machine (EVM) comes up a lot, and simply refers to the (distributed) runtime environment that handles state and computation for Ethereum contracts. It has a simple architecture: a word size of 256 bits, an execution stack, storage (a word-addressable word array storage), and memory (a word-addressable byte array).

Accounts

Just like Bitcoin, Ethereum accounts are created using ECDSA key pairs, with the public key being used for the account address and the private key being used to verify the identity of the account owner.

There are 2 types of accounts in Ethereum: contract accounts and externally owned accounts. Contracts are bodies of code (known as EVM code) that live on the blockchain and are controlled by the code they contain. Externally owned accounts, as you’d expect, are controlled by their owner(s).

Every Ethereum account contains the following state:

  • A nonce, a transaction counter to prevent replay attacks––in other words, to prevent transactions from being processed twice
  • An ether balance. Ether is the main currency of Ethereum
  • Contract code (in contract accounts only)
  • Storage

Gas

Gas is the key to how Ethereum can support Turing-complete programs while still preventing denial of service attacks. Every transaction has a set limit––specified by the sender––of gas that it can expend. Additionally, every unit of gas costs the sender some amount of Ether. If the transaction fails to complete within the specified amount of gas, it fails; if it does successfully complete, well, it successfully completes. By requiring the specification of a gas limit, Ethereum prevents DoS caused by infinite loops; by having gas cost Ether, Ethereum rewards maximally efficient code.

Gas is similar to transaction fees in Bitcoin in that they both provide miners with an incentive to validate the transaction.

The calculation of gas can be complicated and is not important for understanding how Ethereum works. For anyone interested, refer to appendices G and H of the yellow paper.

Some more things to know about gas are:

  1. Storage is more expensive (i.e. it costs more gas) than computing because it increases the overall state of Ethereum. Additionally, freeing up storage will give you a gas refund.
  2. Operations are sorted into sets based on their computational complexity. Predictably, gas cost varies relative to this complexity.

So that you can get a feel for gas prices, I’ll note here that the standard gas price for a transaction is currently 21,000 gas, simple operations such as arithmetic take 3–5 gas, and complex operations such as account creation can have gas costs in the tens of thousands. Additionally, certain operations––namely, those that free up storage on the blockchain––actually result in a gas refund to the transaction sender.

Transactions

In Ethereum, transactions are sent by externally owned accounts, while contract accounts send messages (which I’ll get to later).

Transactions in Ethereum contain the following data:

  • The recipient (can be either a contract account or an externally owned account)
  • The digital signature of the sender
  • The amount of Ether being transferred
  • Data (optional)
  • A STARTGAS value: the maximum amount of gas the transaction is allowed to consume before it is terminated
  • A GASPRICE value: the amount of Ether the sender pays per unit of gas
  • An init field, only if this is a contract-creating account, that contains EVM code for the account initialization procedure

The first 3 fields are standard. Same as Bitcoin. The following 3, though, are what sets Ethereum apart and allows it to support so much other functionality.

The STARTGAS value is maximum amount of gas that the transaction is allowed to consume. The GASPRICE value is the amount of Ether that the sender pays per unit of gas consumed. Both are specified by the sender when he submits a transaction.

The Data field is used to pass any raw data necessary for the transaction.

Transaction Execution

The execution of an Ethereum transaction is, as the yellow paper says, the most complicated part of the Ethereum protocol. Let’s break it down (but not too much).

Before an Ethereum transaction can be distributed for validation, a few things must be verified:

  1. The transaction is properly encoded (as a well-formed RLP)
  2. The transaction signature is valid, i.e. matches the sender
  3. The nonce — which, if you recall, is the transaction counter for a given account — is equal to the sender account’s current nonce
  4. The StartGas (or the gas limit) is larger than the intrinsic gas. Intrinsic gas is amount of gas that this transaction requires prior to execution; it is equal to (21000 gas) + a( fee for data included in the transaction) + (32000 gas, if the transaction creates a contract)
  5. The sender has sufficient ether to pay for the transaction

As the transaction is being executed, transaction substate is accrued. This substate is executed/acted upon following the transaction and contains 3 things:

  1. The suicide set, a set of accounts to be deleted
  2. The log series, a series of indexable ‘checkpoints’ in EVM code execution that allow contract calls made during the transaction to be easily tracked
  3. The refund balance for the sender

The result of the transaction is a new state––Ethereum is, after all, a state machine. Once the transaction has been processed, the state is finalized by acting upon the substate (giving the refund to the transaction sender and delete accounts in the self-destruct set), rewarding the miner for gas consumed, and refunding unused gas to the sender.

Messages

Messages in Ethereum are essentially the same as transactions, except sent by contract accounts. They exist only within the Ethereum execution environment and do not effect anything outside of the blockchain.

Every message contains the following data:

  • The recipient
  • Amount of ether to transfer
  • Data (optional)
  • A STARTGAS value

And the sender is implicit.

The STARTGAS value is determined by the STARTGAS of the transaction that triggered this message. For example, if external actor A sends a transaction to contract B with a STARTGAS of 1000, and B consumes 600 gas before sending a message to C, then the STARTGAS of that message will be 400––equal to the remaining gas.

Uncles/Ommers

According to the glossary, an uncle is:

A child of a parent of a parent of a block that is not the parent, or more generally a child of an ancestor that is not itself an ancestor. If A is an ommer of B, B is a nibling (niece/nephew) of A.

I can’t make sense of it either. Here’s a picture:

Still not doing it for you? Me neither. Just think of an Uncle as a Bitcoin orphan; it was mined but is not a part of the main chain.

Unlike in Bitcoin, however, Uncles aren’t simply discarded and actually play a role in the blockchain. That will come up in the sections concerning blocks and consensus.

Blocks

Ethereum block headers look like this:

The first hash of the parents is the same as Bitcoin and enforces immutability of the blockchain.

The next, ommersHash, has to do with Ethereum’s consensus protocol. Unlike Bitcoin’s orphans, Ethereum uncles actually play a role in the blockchain; this field is a hash of the block headers of previously discovered orphans. I’ll touch on that in the consensus section.

Next, you’ll notice three “root” fields here––the stateRoot, transactionsRoot, and receiptsRoot. Those are the all roots of Merkle-Patricia Trees, a type of ordered tree data structure (a “trie”) that Ethereum uses for its key-value stores. These three tries contain the following information:

  1. The state trie is the global tree containing the world state of the entire Ethereum blockchain; that is, it contains a mapping of accounts to account states.
  2. The transactions trie is unique to every block and is derived from the transactions contained within the blocks. It is a mapping of transaction IDs to transaction details.
  3. The receipts trie is also unique to every block and records the block’s receipts––that is, the outcomes of each transaction. Each receipt contains post-transaction state, gas used, logs created during the transaction, and the Bloom filter composed for those logs. It is a mapping of transaction indices within the block (just like in Bitcoin, transactions in an Ethereum block are indexed) to transaction receipts.

It should also be noted that the contract data contained within accounts lives in its own tree, known as the storage trie. Every account in the state trie has its own storage trie.

The logsBloom is a Bloom filter––a lightweight data structure used to test whether an element is a member of a set––containing information from the log entries of each receipt in the block. Recall that logs are generated when a transaction fires events on the blockchain.

The nonce and mixHash fields are, as specified above, used in Ethereum’s proof-of-work algorithm for mining blocks.

Mining

Fundamentally, Ethereum mining is essentially the same as Bitcoin mining: a hash of the block header is repeatedly computed, with different nonces, until a valid block––that is, one with a hash below a certain threshold (as determined by the difficulty). When that happens, a reward is paid out to the miner of the block for his contributions to the blockchain.

However, in place of Bitcoin’s SHA256 mining task Ethereum uses Ethash as a proof of work algorithm. Ethash is memory-intensive to achieve ASIC-resistance.

It’s also the job of the miner to validate previously discovered uncles from up to 6 blocks back in the blockchain. The miners of validated uncles receive a portion of the block reward. The incentive to do this will become more clear in the “consensus” section; essentially, including uncles in a block increases the chances of the block being included in the main chain.

Ethash

I won’t be getting into the fine details of Ethash, but here is a quick overview of the algorithm.

For every block, a seed can be computed from the previous blocks headers. That seed can then be used to create a pseudorandom cache––that is, a cache of pseudorandomly generated bits––16 MB long. That cache, in turn, is used to generate a 1 GB dataset that consists of 64-bytes items, each of which depends on a small number of items in the cache (this makes block verification very easy, as anyone with the cache can easily reproduce specific items in the dataset). Random slices of the dataset are then hashed together; the resulting hash (and therefore the block) is valid if it is below the desired target.

Consensus

To achieve consensus given different versions of the blockchain, Ethereum uses GHOST: Greedy Heaviest Object Sub Tree.

The key thing to note about this protocol, and the way it differs from Nakamoto consensus, is that uncles are taken into account. Using the blocks’ ommersHash, nodes will get the number of uncles mined for the last 7 blocks in each subtree. That number is, in addition to the number of blocks in that subtree, used to calculate the tree’s weight; the heaviest tree is then said to be the “correct” one.

Other Concepts

Light Clients

Another term that frequently comes up during discussions about Ethereum are light clients. As its name suggests, a light client is a type of client that doesn’t download the entire Ethereum blockchain (in contrast to a full node, which does indeed have a copy of the entire blockchain); in other words they are lightweight clients both in capacity and cost. By downloading only trie headers, they are still able to verify blocks and transactions.

zk-SNARKs

zk-SNARK stands for “zero-knowledge succinct non-interactive arguments of knowledge” and are used by Ethereum (as well as other blockchains such as ZCash) to allow for complete privacy in transactions.

It is a zero-knowledge proof, meaning that the prover can “prove” to the verifier that she knows something without actually telling the verifier what it is; it is succinct, meaning it can be computed in a short amount of time (on the order of milliseconds); and it is “non-interactive,” meaning the only user interaction required is for the prover to send a single message to the verifier.

The “zero-knowledge proof” aspect of zk-SNARKs is what allows it to enable confidentiality in transactions. In the context of Ethereum, imagine Vincent needs to complete some task in order to receive funds from Pop; however those tasks are top-secret and cannot be shared with anyone (especially not Pop). In order to prove that he has indeed completed those tasks, but without revealing any information about what those tasks are, he can use a zk-SNARK.

Proof-of-Stake

Here’s a high-level explanation of proof-of-stake. I won’t go in depth; it’s a hefty topic.

Proof-of-stake is system where a set of known validators pay a deposit to have the ability to forge––what “mining” is called in proof-of-stake––blocks on the blockchain. When the network wants a new block mined, it will select from known validators. Because validators are not competing (at least, not as much) to forge the blocks, the amount of energy expended drastically decreases.

In addition to being more energy-efficient, proof-of-stake changes the incentives of mining/forging; rather than miners being incentivized by block rewards, validators are incentivized by penalties––specifically, losing their deposit if they act maliciously (of course, there is still a reward––albeit a much smaller one––for forging. Otherwise, why put up the deposit in the first place?). This causes the protocol to be more secure, because validators will necessarily have money invested into the blockchain that would be lost if they acted maliciously.

Here’s Vitalik Buterin, one of Ethereum’s creators, making it sound pretty metal:

Imagine 100 people sitting around a circular table. One person has a bundle of papers, each with a different transaction history. The first participant picks up a pen and signs one, then passes it onto the next person, who makes a similar choice. Each participant only gets $1 if they sign the transaction history that most of the participants sign in the end. If you sign one page and later sign a different page, your house burns down.

Frontier, Homestead, Metropolis, Serenity

Ethereum’s releases were planned to have 4 phases: Frontier, Homestead, Metropolis, and Serenity. This was not a precise roadmap for the project; just like any other collaborative software project, Ethereum has undergone its own myriad of developmental changes, delays, and setbacks. Here is a brief overview of the 4:

Frontier (July 2015)

Frontier was the initial barebones release of Ethereum. It included pre-mind coins for crowdsale (to fund the project) as well as the basic functionality of the blockchain––that is, a way to mine, trade, and develop applications on the blockchain.

Homestead (March 2016)

Homestead fixed many of the issues that (inevitably) arose with the Frontier release, and essentially marked Ethereum’s transition from a primitive, risky blockchain to a real-deal piece of tech. It resolved security issues, improved various parts of the protocol, and set the foundation for future improvements on transaction speed or scalability.

Metropolis (Byzantium & Constantinople)

Metropolis was split into 2 different releases: Byzantium in October 2017 (the current release) and Constantinople (not yet released). It’s goal is to make Ethereum much more approachable by giving it the security, speed, and ease of use required for it to be used with confidence by anyone.

So, in the spirit of approachability, Metropolis is introducing more account abstraction by allowing externally owned accounts (which, if you recall are controlled by users) to contain contract code. This enables developers to get creative with how Ethereum accounts on their platforms operate; they can give them more security, automated functionality, and make them generally more accessible.

A major security/privacy change that coming with Metropolis is the introduction of zk-SNARKs and ring signatures (which I will cover if I make an article on Monero), both of which are used for transaction anonymity.

Metropolis is also meant to set the stage for the eventual transition to proof-of-stake in 2 ways:

  1. Every hundredth block will be mined using Casper rather than proof-of-work
  2. The mining reward will decrease from 5 to 3 ether to disincentivize mining

Serenity

Serenity is the final phase of Ethereum and is centered around a complete transition to proof-of-stake. Other protocol changes, as well as the specific conditions of this transitions, are anyone’s guess at this point.

--

--