Smashing the EVM for Fun and…Extensibility

Note: This is a crosspost that was submitted to BlockChannel with the author’s permission; this post originally appeared here.

I began this blog with the intent of speaking about the recent Parity Wallet event. After dissecting the code fully, and reading what else had been written by others, I felt there was little more to say. Short of brute forcing the address of the deleted code, we are not likely to see those funds recovered without a fork.

Instead, I decided to write about the first thought I had while considering the Parity Wallet hack. As I was trying to think of possible solutions, I asked myself the question, “What if I could make a contract that would allow me to run any code, any time, without the need for a library?”

Many contracts allow a user to interact with arbitrary code through the use of a proxy. This can be seen below:

function () payable {someAddress.delegatecall(msg.data);}

This function allows a user to call external code, but what I wanted was to simply call a contract and execute the code I passed in as an argument. Essentially, shellcode for the EVM.

This is obviously NOT a good idea in most places. DAOs, token contracts, or any shared state contract would have no security with this code written into place. That said, it was fun to build.

Disclaimer: This code was thrown together over a few hours, and is in no way optimized or even fully tested. This is a simple proof of concept. Do not use or deploy this code into any contract, for any reason.

The first step to building EVM shellcode was to build a model. I settled on creating a stack machine inside of memory, using Solidity Assembly. Once I had a model, I had to build a loader. The loader I built can be seen on lines 5, 6, and 7.

These lines of code set up an instruction pointer, an argument pointer, and finally the stack pointer. This setup assumes the first bytes32 object from the passed in array is a number specifying the total number of arguments. Thus, the arguments may be found isolated from the instructions. The instructions are loaded in code starting with index 1, and are read one bytes32 at a time. Granted this is a huge waste, since each opcode is only 2 bytes, but it was easier.

1.  contract meta{
2. function run(bytes32[] code){
3. require(msg.sender == owner);
4. assembly{
5. mstore(add(calldatasize,60),160)
6. mstore(add(calldatasize,92),
7. add(sub(sub(calldatasize,4),
8. mul(byte(0,mload(128)),32)),64))
9. mstore(add(calldatasize,124), add(calldatasize,1148))
10. loop:
11. switch byte(0,mload(mload(add(calldatasize,60))))
12. case 0x00{
13. stop
14. }
15. case 0x01{
16. mstore(add(mload(add(calldatasize,124)),32),
17. add(mload(mload(add(calldatasize,124))),
18 mload(add(mload(add(calldatasize,124)),32))))
19. mstore(add(calldatasize,124),
20. add(mload(add(calldatasize,124)), 32) )
21. }

Once a loader was built, and the stack machine created, I needed a way to get instructions to be executed based on the code I was passing in. The easiest way to facilitate this was to build a big ugly switch case inside of a loop. The switch case parses the next instruction by resolving the opcode from the instruction pointer. It then seeks out the case and executes the opcode specified, using the memory based stack machine.

Each case represents the actual EVM opcode values and emulates the operation out of the memory based stack machine. This can be seen on line 16, 17, and 18 where an addition is performed from the memory based stack. Lines 19 and 20 then increment the stack pointer.

This seems like a nice trick, but it is very cumbersome unless you can get in some variables. To solve this problem, I built a push instruction to put arguments on the memory based stack. This can be seen below under case 0x60:

case 0x60{ //load argument 0 
mstore(sub(mload(add(calldatasize,124)),32),
mload(mload(add(calldatasize,92))))
mstore(add(calldatasize,124),
sub(mload(add(calldatasize,124)), 32) )
}
.
.
.
mstore(add(calldatasize,60),add(mload(add(calldatasize,60)),32))
jump(loop)
}

Finally, I needed a way to increment the instruction pointer, and jump back up the loop to read the next instruction. This last snippet of code performs this task quite nicely.

After running a few tests, I proved to myself that an arbitrary user could call any opcode he/she desired, using this code. Some of the instructions do not make sense when using a memory based stack, but others translate just fine. For instance, selfdestruct could be called using this method.

Just in case you missed the point, using this code you could call selfdestruct on a contract that did not explicitly define a selfdestruct method. This may have been handy in a recent wallet hack. Then again, this code would probably be an even bigger security flaw than an unlocked instantiater.

The most exciting application of this hack, in my opinion, is the ability to load bytecode from storage. It may be possible to build a framework that allows for extensible code by modification of storage.

In the coming weeks, I will follow up with more code from this experiment and more details about the methods used. For now, I just wanted to get a few minds thinking about the possibilities of using memory, calldata, and storage to execute arbitrary bytecode. These methods may make for a system that allows code to be dynamically updated and recovered from malicious acts. This code may also cause terrible damage if used improperly.