Develop EVM assembly opcode logic for Fibonacci

Published in

Coinmonks

8 min readMay 8, 2023

Recently I encountered an interesting capture-the-flag challenge on the topic of the Ethereum Virtual Machine (EVM). Capture the flag challenges are cyber security challenges that exist in any subdomain of computer science. There are online academies like QuillCTF where challenges like these can be found in the field of blockchain, covering topics like vulnerability, security, bytecode optimisation, etcetera.

Even more relevant than the competitive part is the educational effect of such competitions; it’s an outstanding way to learn. Even with the emerging number of Layer 2 and Layer 3 solutions, it is still key to understand how smart contracts are executed at the most granular level.

Having spend a significant amount of time on it, I decided to dedicate my first Medium article to it, and walk you through. It’s a challenge on Fibonacci:

What is the shortest runtime bytecode you can write for a contract that satisfies both the following:

Accepts calldata of length 32 bytes representing one uint256 (no function selector)
Returns, as one uint256, the Fibonacci number at the index of the input (the sequence can start at either 0 or 1)

There exist valid solutions shorter than 50 bytes.

Cliffhanger: we end up with a valid solution of 39 bytes, so stay tuned.

EVM & opcode resources

Before diving into the challenge itself, a few resources need to be mentioned. First off, evm.codes is the place to get familiar with opcodes. Opcodes are the fundamental operations that are executed within a stack machine like the EVM, and are in fact, a fundamental aspect in computer science in general. The playground of mentioned resource enables you to get a feeling for how opcodes work and what their effect is on the stack, memory and storage.

Although there is no plethora of resources out there that dive really deep into the topic, one can definitely find interesting reads that provide you with quite a decent start on Solidity bytecode and opcode basics:

Once more familiar, this article on developing EVM logic yourself uses a similar approach to tackle a low-level EVM challenge like the one discussed here.

Fibonacci

The Fibonacci sequence is defined as the sequence in which each number is the sum of the two preceding ones:

So basically what is being asked for is, given some input n, return the Fibonacci number that belongs to that index. Let’s take decimal value 20 as an example. In Solidity the first four bytes of the calldata indicate the function selector, i.e. they identify the function that is being called. Since they are only four bytes, this is also the reason one could encounter collisions: two methods with a different name can have the same identifier, but let’s save that for another time.

In case of above-mentioned example value and no function selector — so without four leading function selector bytes — the calldata, representing one uint256, in hexadecimal format padded to 32 bytes, would be

0x00000000000000000000000000000000000000000000000000000000000014

which, for a valid solution of the challenge, would need to return

0x00000000000000000000000000000000000000000000000000000000001a6d

which is 6765 in decimal format, i.e. the Fibonacci value at index 20.

Solidity algorithm

There are several algorithms to solve Fibonacci. In semi-pseudo Solidity a basic function that would provide a valid solution would look something like

contract Fibonacci {

    fallback() payable external returns (bytes memory k) {
        initialize j = 0
        initialize k = 1
        for {i = 2, i <= n, i++}:
            m = k + j
            j = k
            k = m
       return k
    }

With a time complexity of O(n) this is a fairly simple algorithm that would not require us to use storage, which is an important consideration in the context of gas costs. By splitting this piece of code in separate blocks and develop EVM logic for those blocks, we are able to derive a solution.

Opcodes

The above algorithm can be split up in the following separate tasks.

Load calldata n to the stack.
Initialize variables j and k.
Loop condition: determine if we need to execute the loop body. If n ∈ [0,1] jump to return anchor. This is handled by instantiating i = 2.
Loop body.
a. Increment counter i.
b. Execute body: Fibonacci logic.
Always jump back to loop condition.
Return k to the user.

Now for each separate block the opcodes that perform the corresponding task can be constructed. For clarity the contents of the stack are displayed after each step. For example a stack content of [j][k] indicates j on top of the stack, and k at the bottom, under j.

The separate blocks exclude jump destinations that allow us to navigate through the bytecode. They are added at the end to the loop-condition block and the return block. [0/1] indicates the result of an evaluation, i.e. 0 or 1.

1 Load calldata to the top of stack

Loading calldata to the top of the stack is as simple as pushing the offset to the stack, and then loading the calldata with that offset. Paste this in the EVM playground and see for yourself.

PUSH1 0x00      // stack after operation: [0]
CALLDATALOAD    // stack after operation: [n]

2 Initialize variables for loop

Initializing variables is achieved by simply pushing them to the stack. So depending on their initial variables, these are the variables that you push.

PUSH1 0x01      // Instanstiate k = 1: [k]
PUSH1 0x00      // Instanstiate j = 0: [j][k]

3 Loop condition

Right before the actual loop body, a condition is needed to check if the loop body actually needs to be executed. Basically this boils down to evaluating whether i <= n, or i > n. For this the loop variables are duplicated, since they are not persistent during the evaluation. Starting with stack [i][n] (loop variables), duplicate and check the condition:

DUP2        // stack:           [n][i][n]
DUP2        // stack:           [i][n][i][n]   
GT          // evaluate i <= n: [0/1][i][n]
PUSH1 0xYY  // add jump destination of return block. Stack: [returnblock][0/1][i][n]
JUMPI

Note: [returnblock] here is the byte offset of the destination of the start of the return block, i.e. the location in the bytecode where the return block (see 6) starts . This byte offset is known after compilation in the playground and depends on the length of the full set of opcodes that make up the entire solution.

4a Loop body: increment counter

Incrementing a number which resides on top of the stack, in this case counter i, is as simple as adding 1 to the top of the stack, and subsequently adding the two by executing the ADD opcode.

PUSH1 0x01      // stack: [1][i]
ADD             // stack: [i+1]

4b Loop body: Fibonacci logic

When closely inspecting the Fibonacci logic of the sum operation, it can be noted that

can be written as

and thus that we only need to keep track of two variables, not three, when we implement the opcodes for the Fibonacci logic.

DUP1    // duplicate k:     [k][k][j]
SWAP2   // swap k and j:    [j][k][k]
ADD     // add j and k:     [k*][k]

5 Jump to loop condition

When the loop counter has been incremented and the summation has been done, always jump to the top of the for loop and evaluate again whether it needs to be entered.

PUSH1 0xYY  // jump destination of loop condition, to be determined during compilation
JUMP

6 Return block

A simple return block to return a value on the top of the stack is achieved by first storing the offset in memory with MSTORE, and then push the number of bytes of the to be returned value to the top of the stack. After that, return data to the user. Again, paste the this block of opcodes — except for the JUMPDEST — in the EVM playground, and see how the stack machine behaves depending on which values use.

JUMPDEST        // Destination that you jump into if you want to return to user
PUSH1 0x00      // MSTORE offset: [0][k*][k]
MSTORE
PUSH1 0x20
PUSH1 0x00
RETURN

In this case, we push 0 to the stack, then we store the value which resides at stack[2] (i.e. the value under 0) in memory with offset 0. This means the first 32 bytes of the memory equal the value that needs to be returned. Finally we push 0x20(32 in decimal) to the stack, then offset 0, such that the first 32 bytes are returned, which is the desired output value of the algorithm in hexadecimal format.

Final solution

With some trial and error, the above opcode blocks can now be put together to get a valid solution. To keep overview, I highly recommended to list the operations in a Google Sheet during the process and put a separate column containing the stack content next to it.

So the final solution in mnemonics is

PUSH1 0x00
CALLDATALOAD
PUSH1 0x01
PUSH1 0x00
SWAP2
PUSH1 0x02
JUMPDEST
DUP2
DUP2
GT
PUSH1 0x1c
JUMPI
PUSH1 0x01
ADD
SWAP2
DUP1
SWAP4
ADD
SWAP2
PUSH1 0x0a
JUMP
JUMPDEST
POP
POP
PUSH1 0x00
MSTORE
PUSH1 0x20
PUSH1 0x00
RETURN

The runtime bytecode representation of this solution is simply

600035600160009160025b818111601c576001019180930191600a565b505060005260206000f3

which is 39 bytes in length. You can copy paste this in the EVM playground and see that this is a valid solution for returning the Fibonacci number at the index of the input F(n), starting at sequence 1.

Note that n = 0 returns 1, which is not valid. This can be solved and optimized by changing how the loop variables are initialized, which could even potentially result in a shorter runtime bytecode solution.

The solution has been tested for n ∈ [1, 20] and n = 100 , providing valid solutions with approximate gas usage of 21K gas per call.

Improvements

There are definitely potential improvements that can be made to the derived solution. For example by swapping the order of certain steps or by rethinking the initialization of the loop variables the solution can be shortened by a few bytes already.

However, as mentioned before, the educational intention is more relevant. Thank you for reading this article, hopefully it was a useful and enjoyable resource!