How formal verification can ensure flawless smart contracts
Bernhard Mueller, the creator of the Mythril smart contract analyzer, lays out how to eradicate errors in the Ethereum bytecode.
Smart contract security is serious business. Previously, I wrote about detecting common security issues in Ethereum bytecode, but generic tests like these only scratch the surface. Ideally, we want to be sure that our contract behaves correctly 100% of the time. Formal verification allows us to prove conclusively that certain error states can never occur.
Note: This article is outdated and the code won’t work with recent versions of Mythril. You can now reproduce the same examples more easily with MythX.
There is a considerable amount of academic research into semantic modeling of the Ethereum Virtual Machine (EVM) and formal verification of smart contracts using various frameworks (links below). In this article, I’ll describe an approach based on symbolic execution and the Z3 theorem prover, and demonstrate that this method is capable of detecting subtle bugs, such as those resulting from Solidity memory addressing mechanics.
Our specimen of choice is Ownable, a commonly used base contract that defines the owner
state variable as well as a constructor, a modifier, and a function for transferring ownership.
Computing the state space
The first step is to symbolically execute the code and produce a representation of all possible program states. Each state consists of a set of concrete and/or symbolic values associated with the contract account (storage, balance,… ) and virtual machine environment (program counter, call data, etc.), as well as a set of path constraints, i.e. the requirements that must be fulfilled to reach that particular state.
Mythril’s symbolic execution engine computes the state space in a fully automated way. To visualize the possible paths through the program, we can render a control flow graph with the myth
command line tool.
$ myth -g Ownable.html Ownable.sol
The result is a graph that provides a high-level overview of the program flow. Each node in the graph represents a basic block of code, while the edges denote the path conditions.
Fully mapping the state space is not always that trivial: Constructs such as unbounded loops and recursive inter-contract calls can lead to an exponential increase in the number of states. Thanks to Ethereum’s gas concept however, we can be sure that execution always halts. I will write more on this topic in subsequent posts — for now, we’ll stick with our simplistic example.
Specification
Given the compiled bytecode of the Ownable
contract, we’ll attempt to prove that Ownable
behaves correctly with respect to the following safety property:
“Ownership can be transferred by the contract owner only.”
We will formalize this property using the language of the Ethereum yellow paper: The global state is denoted as σ, the machine state as μ, et cetera. Some of the symbols are explained in the image below, but if you’re not familiar with them I recommend studying the paper first.
Following Solidity compiler conventions, the variable owner
is represented by the value stored in storage slot 0 (this can be confirmed by inspecting the bytecode). We are therefore interested in the subset of state transitions σ => σ′ in which the contents of storage slot 0 are altered:
According to the EVM language definition only the SSTORE
instruction modifies global storage. We can therefore more easily obtain the set of relevant initial states σ by selecting from all pairs of global state and machine state (σ, μ) where:
An error state exists if it is possible to satisfy the logical formula:
In plain English, we attempt to prove that it is impossible to reach a point in the program where the variable owner is changed, but msg.sender
does not match the current owner.
Translating the specification into a Python program
The program below performs an exhaustive search for a solution to the above formula. If the formula can be satisfied within the state space of Ownable
, we obtain a set of concrete values (for state and input variables) that violate the specification. Conversely, if the formula is proven unsatisfiable, we conclude that the contract is safe with respect to the specification.
Running the analysis on Ownable
returns the following result:
$ ./analysis.py Ownable.sol
Analysis completed.
Since no counterexample has been found, we can conclude that Ownable
is safe with respect to modifications of the owner
state variable, with a few caveats:
- We assume that the Ethereum virtual machine always behaves correctly;
- The result only applies to the
Ownable
contract in isolation, not necessarily to contracts derived from it.
Detecting a not-so-obvious bug
The above result is hardly surprising: Looking at the Solidity code, it is pretty obvious that the specification always holds. Interestingly, however, our analysis tool will detect any kind of violation, including subtle bugs that might not be immediately noticeable to an auditor. Consider Pwnable.sol
, a modified Ownable.sol
that adds a dynamic array:
At a quick glance, this doesn’t look so bad: After all, transferOwnership()
is still the only function that writes to the owner variable, and it is protected by the onlyOwner()
modifier. This time however, the analysis tool complains:
$ python analysis.py Pwnable.solViolation found! Function: _function_0x7ca5dcce, address: 410
storage_1 = 0x4ef1d2ad89edf8c4d91132028e8195cdf30bb4b5053d4f8cd260341d48080000
caller = 0x0
storage_0 = 0xffffffffffffffffffffffffffffffffffffffff00
calldata__0 = 0x7ca5dcce00000000000000000000000000000000000000000000000000000000
calldatasize_ = 0x4
calldata__4 = 0x4ef1d2ad89edf8c4d91132028e8195cdf30bb4b5053d4f8cd260341d4805f30a
callvalue = 0x0Analysis completed.
It appears that certain inputs result in overwriting the owner variable. As it turns out, this is due to the way dynamically sized arrays are laid out in storage:
- The length of the array is stored at storage slot 1;
- The data is stored at the storage address
keccak(1) + offset
.
When offset
is set to (MAX_UINT — keccak(1))
, this adds up to zero and we’re addressing storage slot 0. Note that this only works because a1.length
is intentionally underflowed in the constructor. However, similar issues do occasionally occur in practice: In Solidity, the array length variable is often managed explicitly, and array.length--
is a common pattern (to learn more about this type of issue, check out Doug Hoyte’s USCC submission and watch my upcoming HITB talk).
Exploitation
To trigger the issue, all we need to do is to create a function call using the output of the analysis tool.
calldata_0
: Function signature hash offun(uint256,address)
calldata_4
: Offset found by the solver, should match2**256 — keccak256(1))
The offset value produced by the solver can be used as is. For clarity however, here is how it can be calculated in Python.
>>> from ethereum import utils
>>> hash = utils.sha3("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01")
>>> hash_num = int.from_bytes(hash, byteorder="big")
>>> 2**256 — hash_num
35707666377435648211887908874984608119992236509074197713628505308453184860938
>>> hex(2**256-hash_num)
'0x4ef1d2ad89edf8c4d91132028e8195cdf30bb4b5053d4f8cd260341d4805f30a'
We end up with the following function call:
fun(35707666377435648211887908874984608119992236509074197713628505308453184860938,”0x11223344556677889900112233445566deadbeef”)
To confirm that this actually works, you can deploy Pwnable.sol
on remix and perform the function call through the UI (note that the Integer argument must be put in quotes). After performing the call, reading the contents of owner
should return the new address.
TL;DR
In this article, I showed how to formally verify the correctness of a smart contract with respect to well-defined properties using a symbolic execution engine and SMT solver. This type of analysis can prove certain properties of the code and is capable of detecting subtle bugs.
Further Reading
As mentioned above, a lot of awesome research is available in this field. Here are a few resources that deal with formal analysis of EVM code as well as program verification in general:
- Smashing Ethereum smart contracts for fun and real profit
- Yoichi Hira’s Formal Verification Repository on Github
- Sidney Amani et.al, Towards Verifying Ethereum Smart Contract Bytecode in Isabelle/HOL
- Karthikeyan Bhargavan et. al., Formal Analysis of Smart Contracts
- Nikolaj Bjørner and Leonardo de Moura - Applications of SMT solvers to Program Verification
About Mythril and MythX
Mythril is a free and open-source smart contract security analyzer. It uses symbolic execution to detect a variety of security vulnerabilities.
MythX is a cloud-based smart contract security service that seamlessly integrates into smart contract development environments and build pipelines. It bundles multiple bleeding-edge security analysis processes into an easy-to-use API that allows anyone to create purpose-built smart contract security tools. MythX is compatible with Ethereum, Tron, Vechain, Quorum, Roostock and other EVM-based platforms.
Disclaimer: The views expressed by the author above do not necessarily represent the views of Consensys AG. ConsenSys is a decentralized community with ConsenSys Media being a platform for members to freely express their diverse ideas and perspectives. To learn more about ConsenSys and Ethereum, please visit our website.
Disclaimer: The views expressed by the author above do not necessarily represent the views of ConsenSys AG. ConsenSys is a decentralized community with ConsenSys Media being a platform for members to freely express their diverse ideas and perspectives. To learn more about ConsenSys and Ethereum, please visit our website.