Introducing Mythril: A framework for bug hunting on the Ethereum blockchain
Note: This is an article from ancient times. For an up-to-date introduction to Mythril, read this article instead.
Unless you’ve been living under a rock for the past three years, you have surely taken notice of an industry buzzword that has been giving “machine learning” a run for its money: Blockchain.
Ethereum is one of the most successful implementations of the concept. In contrast to Bitcoin, which offers limited scripting capabilities, Ethereum provides a Turing-complete virtual machine. State transitions in the network (such as a changes in account balance of a particular token) are regulated by code running in the virtual machine, a.k.a. “smart contracts”.
An ancient security saying goes: “With great flexibility comes great potential for vulnerabilities”. It doesn’t help that the semantics of Ethereum’s most popular high-level programming language Solidity are often counter-intuitive, creating many possibilities for developers to mess up. A great example for this is the Parity multisig wallet bug, which allowed an unknown attacker to withdraw 153,037 Ether (worth more that USD 30 million) after their tinder date turned out to be a real creep.
The Parity debacle shows that implementation errors can remain undetected for months, even when the contract is deployed on the mainnet and its source code is openly available. One can only speculate what kind of vulnerabilities might be hidden in the thousands contracts deployed on the chain, many of which are black-boxes (in the sense that the source code isn’t published on Etherscan).
Not surprisingly, such a rich source of potential vulnerabilities with a monetary payout doesn’t escape the attention of security folks of the “white-hat” and “black-hat” varieties. It’s smashing the stack* for fun and profit all over again — only this time there is real profit (*note that not only does the EVM have a stack, it also doesn’t have registers, so almost every instruction uses the stack).
When I started looking into Ethereum a few weeks ago, I found quite a few useful tools for analyzing contracts on the mainnet. Etherscan and remix allow researchers to conveniently browse, disassemble and debug contracts in the web browser. The Porosity decompiler can (to a certain extent) restore source code from a given bytecode. Truffle and testrpc make it easy to compile and debug Solidity code.
Originally, I was hoping to run PyEthApp and directly access the state in its LevelDB. Unfortunately, PyEthApp seems to have suffered a lack of maintenance and development for quite some time and doesn’t sync with the Ethereum mainnet. Mythril therefore needs RPC access to a fully connected go-ethereum node. Install go-ethereum and start your node as follows:
$ geth --rpc --rpcapi eth,debug --syncmode fast console 2>/dev/null
Note that Mythril uses non-standard go-ethereum debug APIs, so some while it should work with other Ethereum clients, some functions won’t be available.
Mythril itself can be installed via Pypi:
$ pip install mythril
This will install both the Python modules and the
myth command line tool.
Mythril enables search operations like those described in the legendary “Mitch Brenner” blog post in minutes instead of days. To achieve this, it creates a snapshot of the contracts deployed on the mainnet. Run the following command to initialize the database:
$ myth --init-db
The whole process takes some time (to be honest it’s not very efficient, I hope to provide a better implementation at some point). If you don’t want to sync the whole chain right away, you can hit
ctrl+c at any point, and syncing will auto-resume the next time you run mythril with the
Command line Usage
Once you have some contracts in your database, you can run search commands to look for function signatures and opcode sequences. The expression syntax is as follows:
- func#[function signature]#
For example, the command below will output all contracts that have a function named
$ myth --search “func#changeMultisig(address)#”
Matched contract with code hash 2bfa6e34330ac57501bd0f6c84d50fcd
Address: 0x3665f2bf19ee5e207645f3e635bf0f4961d661c0, balance: 4999600000000000000
Matched contract with code hash 98623854d849f0d97c55b98e0238eb7b
Address: 0x2d36cb89a977209703c1d6304f23198c22b7a498, balance: 63686800960937000000
The search feature supports simple boolean expressions. The following command above prints all contracts that contain both a function named
changeMultisig(address) and the opcode sequence
PUSH1 0x50, POP:
$ myth --search “func#changeMultisig(address)# and code#PUSH1 0x50,POP#”
The disassembler is invoked with the
-d flag. It accepts either a bytecode string via the
-c argument or a contract address via
-a ADDRESS (this will download the contract code from your Ethereum node).
Mythril tries to resolve function names using a built-in signatures file originally obtained from the Ethereum Signature Database. If you end up using Mythril, you are very welcome to commit updates to that file.
$ myth -d -a 0x2d36cb89a977209703c1d6304f23198c22b7a498
0 PUSH1 0x60
2 PUSH1 0x40
212 — FUNCTION changeMultisig(address) -
One of Mythril’s “killer features” is the call graph generator. Adding the
-g OUTPUT_FILE argument will cause Mythril to save a graph in HTML format:
myth -g ~/Desktop/graph.html -a 0x2d36cb89a977209703c1d6304f23198c22b7a498
Open the resulting file in the web browser to view the graph. Usually, you can get a pretty good overview of available execution paths (fortunately, smart contracts aren’t all that complex).
Using the call graph together with execution tracing to gradually reverse engineer a contract has been working well for me, although it would be nice to have a GUI-based SVG editor to annotate (if you know one, please let me know in the comments).
It is often useful to identify other contracts referenced by a particular contract. Let’s assume you want to search for contracts that use the
DELEGATECALL instruction in their fallback function, as was the case in the Parity Bug. You can do this using dynamic analysis: Simply run every contract in the PyEthereum VM without any inputs, and check if the
DELEGATECALL instruction is executed. The Mythril repo contains an example script showing how to do this. It should output something like the following:
$ python examples/find-fallback-dcl.py
DELEGATECALL in fallback function: Contract 0x07459966443977122e639cbf7804c446
DELEGATECALL in fallback function: Contract 0x17c9e5b7f2bfd8307d628f2d9fcc9352
DELEGATECALL in fallback function: Contract 0x17f9db8b6ffa854335b319d01f09ba39(…)
As the name implies, the
DELEGATECALL instruction delegates execution to a different contract, so naturally you’ll be interested which contract is called. You can print the addresses of referenced contracts with the
$ myth --xrefs 0x07459966443977122e639cbf7804c446
Instead of using the command line tool, you can also follow the cross-references programmatically and run further analysis on the referenced contracts (
find-fallback-dcl.py contains an example for this as well).
While the command line tool is neat, only with custom code may you unlock the full power of Mythril. In addition to the contract database, disassembler and EVM tracing modules, Mythril also includes modified version of ethjsonrpc, allowing you to deploy and trace code on a testrpc node. By combining all this you can piece together some decent static and dynamic analysis.
To open the contract database from a Python program use the
get_persistent_storage function. This will return a
ContractStorage object (by default, the database lives in
[your-home]/.mythril, but you can override this in the constructor). Call the method
search(expression, callback) to start a search:
from mythril.ether.contractstorage import get_persistent_storagecontract_storage = get_persistent_storage()contract_storage.search("FUNC#getOwner()#", myCallback)
The callback function passed in the second argument will be called for every search result. It receives the following arguments:
- The hash key identifying the contract in Mythril’s database
- An ETHContract object containing the current contract
- A list of addresses at which the contract lives in the blockchain
- A list balances of the each of the deployed contracts
def myCallBack(contract_hash, contract, addresses, balances):
# Do something…
A useful pattern is searching for some particular type of contract, and then performing a set of analysis task on each result. Let’s have a look at a second example doing just that.
Let’s assume you want to scan the contract database for conditions akin to the Parity bug, but in a generic way. One idea is to look for any function that, when passed either no argument, an address, or a list of addresses, ends up writing your address to storage with the
SSTORE instruction. Of course this doesn’t necessarily mean that you’re overwriting an important state variable such as
owners, but it’s definitely the kind of behavior you want to investigate further.
In the prior example, we saw how code can be traced in the PyEthereum VM. For a more advanced analysis that also incorporates state (such as available accounts, contract storage, calling the constructor, etc.) it is better to deploy the contract on testrpc. In my test environment, I have geth running on port 8545 and a testrpc instance on port 8546, which allows me to move contracts from the real network to testrpc instantly. To run the example code, start testrpc as follows:
$ testrpc --port 8546 --gasLimit 0xFFFFFFF --account \
0x0b6f3fd29ca0e570faf9d0bb8945858b9c337cd2a2ff89d65013eec412a4a811,500000000000000000000 --account \
We want to look at all contracts in the database, so we can either use a search term that matches every contract, or simply iterate over the contracts:
for k in contract_keys: contract = contract_storage.contracts[k]
This will return ETHContract objects that store both the code of the contract (
contact.code) and the code of the transaction that created the contract (
To re-create the contract in your own private chain or on testrpc, replay the contract creation transaction using Mythril’s JSON RPC client:
from mythril.rpc.client import EthJsonRpctestrpc = EthJsonRpc(“localhost”, 8546)# Deploy on testrpccreator_addr = "0xadc2f8617191ff60a36c3c136170cc69c03e64cd"ret = testrpc.eth_sendTransaction(from_address=creator_addr, gas=5000000, value=0, data=contract.creation_code)
receipt = testrpc.eth_getTransactionReceipt(ret)
contract_addr = receipt[‘contractAddress’]
This should return a transaction receipt containing the contract address. Note that testrpc “mines” a new block whenever it receives a transaction, so your contract is deployed instantaneously.
Disassembly class lets you access the list of instructions, formatted easm code, cross-references and functions of a contract. It takes a single constructor argument, the contract bytecode:
disas = Disassembly(contract.code)
The Disassembly object has two lists,
addr_to_func, that contain mappings between function names and addresses. You can iterate over
func_to_addr to get the signature of each function (note that unidentified functions are labeled as “UNK_[address]”).
for function_selector in disas.func_to_addr: # do something with the function signature. E.g.:
In the example script, every available function is called multiple times with various arguments (e.g. no argument, an address, a list of addresses). I won’t explain all of that in detail here — please have a look at the code to see how to encode the call data and send the transaction.
Finally, to trace execution of a function call, use the
traceTransaction RPC method:
tx = testrpc.eth_sendTransaction(to_address=contract_addr, from_address=addr_schnupper, gas=5000000, value=0, data=data)trace = testrpc.traceTransaction(tx)
This will return a dictionary containing every instruction executed, along with the stack at each point of execution. We are only interested in
SSTORE instructions that have our target address on the second-to-top position on the stack (i.e. the “attacker’s” address is written to storage). We can search the instruction list as follows:
for t in trace[‘structLogs’]:
if t['op'] == 'SSTORE':
if addr_schnupper[2:] in t['stack'][-2]:
Possible next steps could include running further static and dynamic analysis to determine the effects of the overwritten address, or dumping a callgraph for manual analysis.
The usage scenarios detailed here are only the tip of the iceberg: You can build almost arbitrarily complex blockchain scanners on top of Mythril’s APIs. However, note that many of Mythril’s components such contract storage, search expressions, and others still have a lot of room for improvement. You are welcome to contribute better implementations and additional analysis scripts on the GitHub repository.
About Mythril and MythX
Mythril is a free and open-source smart contract security analyzer. It uses symbolic execution to detect a variety of security vulnerabilities.
MythX is a cloud-based smart contract security service that seamlessly integrates into smart contract development environments and build pipelines. It bundles multiple bleeding-edge security analysis processes into an easy-to-use API that allows anyone to create purpose-built smart contract security tools. MythX is compatible with Ethereum, Tron, Vechain, Quorum, Roostock and other EVM-based platforms.