Py-EVM Part 2: Opcodes

In part 1 of this series we covered the origins of why Py-EVM was created.

In the next few posts we’ll take a look at some of the architecture of Py-EVM. In this post we’ll start at the lowest level of the EVM and look at how opcodes are implemented.

Opcode Primer

The code that the EVM executes, often referred to as bytecode, is made up of many individual opcodes. An opcode is the smallest unit of computation. Opcodes are normally referenced by their mnemonic name but are canonically represented by a hexidecimal number.

  • ADD: 0x01
  • PUSH32: 0x7f
  • DELEGATECALL: 0xf4

EVM execution involves iterating over some bytecode, executing each opcode until execution is complete. The specifics of how each opcode executes are what define the baseline functionality of the EVM. Higher level languages like Solidity compile down to EVM bytecode, translating the code you’ve written into a list of opcodes that represent the same application logic.

The Ethereum yellow paper defines approximately 130 opcodes.

Opcodes in PyEthereum

One of the main departures from the PyEthereum architecture is how Py-EVM handles opcodes. In PyEthereum, the opcode logic is defined within the vm.py module (found here). Each opcode is a clause within an extensive if/elif/else clause that is a bit over 400 lines in length and which contains a branch for each of the roughly 130 EVM opcodes. While this design is pragmatic and efficient, it is by no means modular or extensible. Here is a small excerpt.

if op == 'STOP':
return peaceful_exit('STOP', compustate.gas, [])
elif op == 'ADD':
stk.append((stk.pop() + stk.pop()) & TT256M1)
elif op == 'SUB':
stk.append((stk.pop() - stk.pop()) & TT256M1)
elif op == 'MUL':
stk.append((stk.pop() * stk.pop()) & TT256M1)
elif op == 'DIV':
s0, s1 = stk.pop(), stk.pop()
stk.append(0 if s1 == 0 else s0 // s1)
...

In order to add new opcodes, a new if/else clause must be added to this body of code. There are also additional if/else statements for each hard fork protocol change. The result is a very complex module that does not lend itself well to extension, modification or experimentation.

Opcodes in Py-EVM

In Py-EVM, each opcode is a single function. Here is what the function for the ADD opcode looks like.

def add_op(computation):
computation.gas_meter.consume_gas(3, reason='ADD')
left, right = computation.stack.pop(
num_items=2,
type_hint=constants.UINT256,
)
    result = (left + right) & constants.UINT_256_MAX
    computation.stack.push(result)

Opcode functions takes a single argument, the computation object, which exposes APIs for all actions that opcodes may need to perform such as stack manipulation, reading account state or consuming gas.

Constructing a VM

Now lets look at how opcodes are composed together into a VM.

from evm import VM
ExampleVM = VM.configure(
opcodes={
0x01: add_op,
},
...
)

This example creates a VM with a single opcode. The VM opcodes are specified as a mapping from opcode number to the function containing the opcode logic.

What’s in a VM?

The term VM in the Py-EVM context is used to refer to a single set of rules for a given period of the blockchain. For example, at the time of writing this post, the public Ethereum mainnet has four rule sets:

  • The initial Frontier rules
  • The Homestead for rules
  • The DAO fork rules
  • The Anti-DOS fork rules

The opcodes as functions pattern makes adjusting the protocol rules easy and elegant. Opcodes can be modified, added, and removed using first class APIs, eliminating the slow build up of if/else statements to handle various hard fork protocol rule changes. In fact, most of the existing protocol changes including the addition of the DELEGATECALL opcode for Homestead as well as all of the gas cost increases in the Anti-DOS can implemented using this API.


Py-EVM is still under heavy development, but as it stands, the core API allows for extreme flexibility for alternative EVM implementations as well as rapid prototyping of new features. A researcher can experiment with adding new opcodes or modifying existing ones without modification to the core library.

This type of flexibility is being designed and implemented at every level of the protocol and we’re working hard to get it into your hands as soon as possible. Come join us in developing a new Python implementation of the EVM.