Diving Into The Ethereum VM Part 5 — The Smart Contract Creation Process

zh
9 min readOct 24, 2017

--

In previous articles of this series, we’ve learned the basics of EVM assembly, as well as how ABI encoding allows the outside world to communicate with a contract. In this article, we’ll see how a contract is created from nothing.

Previous articles of this series (in order).

The EVM bytecode we’ve seen so far is straightforward, just instructions that the EVM executes from top to bottom, no magic up its sleeve. The contract creation process is more fun, in that it blurs the barrier between code and data.

In learning how a contract is created, we’ll see that sometimes data is code, and sometimes code is data.

Put on your favourite wizard hat 🎩

A Contract’s Birth Certificate

Let’s create a simple (and completely useless) contract:

pragma solidity ^0.4.11;contract C {
}

Compile it:

solc --bin --asm c.sol

And the bytecode is:

60606040523415600e57600080fd5b5b603680601c6000396000f30060606040525b600080fd00a165627a7a723058209747525da0f525f1132dde30c8276ec70c4786d4b08a798eda3c8314bf796cc30029

To create this contract, we’ll need to create a transaction by making an eth_sendtransaction RPC call to an Ethereum node. You could use Remix or Metamask to do this.

Whatever deployment tool you use, the parameters for the RPC call would be something similar to:

{
"from": "0xbd04d16f09506e80d1fd1fd8d0c79afa49bd9976",
"to": null,
"gas": "68653", // 30400,
"gasPrice": "1", // 10000000000000
"data": "0x60606040523415600e57600080fd5b603580601b6000396000f3006060604052600080fd00a165627a7a723058204bf1accefb2526a5077bcdfeaeb8020162814272245a9741cc2fddd89191af1c0029"
}

There is no special RPC call or transaction type to create a contract. The same transaction mechanism is used for other purposes as well:

  • Transferring Ether to an account or contract.
  • Calling a contract’s method with parameters.

Depending on what parameters you specified, the transaction is interpreted differently by Ethereum. To create a contract, the to address should be null (or left out).

I’ve created the example contract with this transaction:

https://rinkeby.etherscan.io/tx/0x58f36e779950a23591aaad9e4c3c3ac105547f942f221471bf6ffce1d40f8401

Opening Etherscan, you should see that the input data for this transaction is the bytecode produced by the Solidity compiler:

When processing this transaction, the EVM would have executed the input data as code. And voila, a contract was born.

What The Bytecode Is Doing

We can break the bytecode above into three separate chunks:

// Deploy code
60606040523415600e57600080fd5b5b603680601c6000396000f300
// Contract code
60606040525b600080fd00
// Auxdata
a165627a7a723058209747525da0f525f1132dde30c8276ec70c4786d4b08a798eda3c8314bf796cc30029
  • Deploy code runs when the contract is being created.
  • Contract code runs after the contract had been created, when its methods are called.
  • (optional) Auxdata is the cryptographic fingerprint of the source code, used for verification. This is just data, and never executed by the EVM.

The deploy code has two main purposes:

  1. Runs the constructor function, and sets up initial storage variables (like contract owner).
  2. Calculates the contract code, and returns it to the EVM.

The deploy code generated by the Solidity Compiler loads the bytes 60606040525b600080fd00 from bytecode into memory, then returns it as the contract code. In this case, the "calculation" is merely reading a chunk of data into memory. In principle, we could programatically generate the contract code.

Exactly what the constructor does is up to the language, but any EVM language would have to return the contract code at the end.

Contract Creation

So what happens after the deploy code had run and returned the contract code. How does Ethereum create a contract from the returned contract code?

Let’s dig into the source code together to learn the details.

I’ve found that the Go-Ethereum implementation is the easiest reference to find the information I need. We get proper variable names, static type info, and symbol cross-references. Try beating that, Yellow Paper!

The relevant method is evm.Create, read it on Sourcegraph (there’s type info when you hover over a variable, pretty sweet). Let’s skim the code, omitting some error checking and fussy details. From top to bottom:

  • Check whether caller has enough balance to make a transfer:
if !evm.CanTransfer(evm.StateDB, caller.Address(), value) {
return nil, common.Address{}, gas, ErrInsufficientBalance
}
  • Derive the new contract’s address from the caller’s address (passing in the creator account’s nonce):
contractAddr = crypto.CreateAddress(caller.Address(), nonce)
  • Create the new contract account using the derived contract address (changing the “world state” StateDB):
evm.StateDB.CreateAccount(contractAddr)
  • Transfer the initial Ether endowment from caller to the new contract:
evm.Transfer(evm.StateDB, caller.Address(), contractAddr, value)
  • Set input data as contract’s deploy code, then execute it with EVM. The ret variable is the returned contract code:
contract := NewContract(caller, AccountRef(contractAddr), value, gas)
contract.SetCallCode(&contractAddr, crypto.Keccak256Hash(code), code)
ret, err = run(evm, snapshot, contract, nil)
  • Check for error. Or if the contract code is too big, fail. Charge the user gas then set the contract code:
if err == nil && !maxCodeSizeExceeded {
createDataGas := uint64(len(ret)) * params.CreateDataGas
if contract.UseGas(createDataGas) {
evm.StateDB.SetCode(contractAddr, ret)
} else {
err = ErrCodeStoreOutOfGas
}
}

Code That Deploys Code

Let’s now dive into the nitty gritty assembly code to see how “deploy code” returns “contract code” when a contract is created. Again, we’ll analyze the example contract:

pragma solidity ^0.4.11;contract C {
}

The bytecode for this contract broken into separate chunks:

// Deploy code
60606040523415600e57600080fd5b5b603680601c6000396000f300
// Contract code
60606040525b600080fd00
// Auxdata
a165627a7a723058209747525da0f525f1132dde30c8276ec70c4786d4b08a798eda3c8314bf796cc30029

The assembly for the deploy code is:

// Reserve 0x60 bytes of memory for Solidity internal uses.
mstore(0x40, 0x60)
// Non-payable contract. Revert if caller sent ether.
jumpi(tag_1, iszero(callvalue))
0x0
dup1
revert
// Copy contract code into memory, and return.
tag_1:
tag_2:
dataSize(sub_0)
dup1
dataOffset(sub_0)
0x0
codecopy
0x0
return
stop

Tracing through the above assembly for returning the contract code:

// 60 36 (PUSH 0x36)
dataSize(sub_0)
stack: [0x36]
dup1
stack: [0x36 0x36]
// 60 1c == (PUSH 0x1c)
dataOffset(sub_0)
stack: [0x1c 0x36 0x36]
0x0
stack: [0x0 0x1c 0x36 0x36]
codecopy
// Consumes 3 arguments
// Copy `length` of data from `codeOffset` to `memoryOffset`
// memoryOffset = 0x0
// codeOffset = 0x1c
// length = 0x36
stack: [0x36]
0x0
stack: [0x0 0x36]
memory: [
0x0:0x36 => calldata[0x1c:0x36]
]
return
// Consumes 2 arguments
// Return `length` of data from `memoryOffset`
// memoryOffset = 0x0
// length = 0x36
stack: []
memory: [
0x0:0x36 => calldata[0x1c:0x36]
]

dataSize(sub_0) and dataOffset(sub_0) are not real instructions. They are in fact PUSH instructions that put constants onto the stack. The two constants 0x1C (28) and 0x36 (54) specifies a bytecode substring to return as contract code.

The deploy code assembly roughly corresponds to the following Python3 code:

memory = []
calldata = bytes.fromhex("60606040523415600e57600080fd5b5b603680601c6000396000f30060606040525b600080fd00a165627a7a72305820b5090d937cf89f134d30e54dba87af4247461dd3390acf19d4010d61bfdd983a0029")
size = 0x36 // dataSize(sub_0)
offset = 0x1c // dataOffset(sub_0)
// Copy substring of calldata to memory
memory[0:size] = calldata[offset:offset+size]
// Instead of return, print the memory content in hex
print(bytes(memory[0:size]).hex())

The resulting memory content is:

60606040525b600080fd00
a165627a7a72305820b5090d937cf89f134d30e54dba87af4247461dd3390acf19d4010d61bfdd983a0029

Which corresponds to the assembly (plus auxdata):

// 6060604052600080fd00
mstore(0x40, 0x60)
tag_1:
0x0
dup1
revert
auxdata: 0xa165627a7a723058209747525da0f525f1132dde30c8276ec70c4786d4b08a798eda3c8314bf796cc30029

Looking again on Etherscan, this is exactly what was deployed as the contract code:

CODECOPY

The deploy code uses codecopy to copy from transaction’s input data to memory.

The exact behaviour and arguments for the codecopy instruction is less obvious than other simpler instructions. If I were to look it up in the Yellow Paper, I’d probably get more confused. Instead, let’s refer to the go-ethereum source code to see what it’s doing.

See CODECOPY:

func opCodeCopy(pc *uint64, evm *EVM, contract *Contract, memory *Memory, stack *Stack) ([]byte, error) {
var (
memOffset = stack.pop()
codeOffset = stack.pop()
length = stack.pop()
)
codeCopy := getDataBig(contract.Code, codeOffset, length)
memory.Set(memOffset.Uint64(), length.Uint64(), codeCopy)
evm.interpreter.intPool.put(memOffset, codeOffset, length)
return nil, nil
}

No Greek letters!

The line evm.interpreter.intPool.put(memOffset, codeOffset, length) recyles objects (big integers) for later uses. It is just an efficiency optimization.

Constructor Argument

Aside from returning the contract code, the other purpose of the deploy code is to run the constructor to set things up. If there are constructor arguments, the deploy code needs to somehow load the arguments data from somewhere.

The Solidity convention for passing constructor arguments is by appending the ABI encoded parameter values at the end of the bytecode when calling eth_sendtransaction. The RPC call would pass in the bytecode and ABI encoded params together as input data, like this:

{
"from": "0xbd04d16f09506e80d1fd1fd8d0c79afa49bd9976"
"data": hexencode(compiledByteCode + encodedParams),
}

Let’s look at an example contract with one constructor argument:

pragma solidity ^0.4.11;contract C {
uint256 a;
function C(uint256 _a) {
a = _a;
}
}

I’ve created this contract, passing in the value 66. The transaction on Etherscan:

https://rinkeby.etherscan.io/tx/0x2f409d2e186883bd3319a8291a345ddbc1c0090f0d2e182a32c9e54b5e3fdbd8

The input data is:

0x60606040523415600e57600080fd5b6040516020806073833981016040528080519060200190919050508060008190555050603580603e6000396000f3006060604052600080fd00a165627a7a7230582062a4d50871818ee0922255f5848ba4c7e4edc9b13c555984b91e7447d3bb0e7400290000000000000000000000000000000000000000000000000000000000000042

We can see the constructor argument at the very end, which is the number 66, but ABI encoded as a 32 bytes number:

0000000000000000000000000000000000000000000000000000000000000042

To process the arguments in the constructor, the deploy code copies the ABI parameters from the end of the calldata into memory, then from memory onto the stack.

A Contract That Creates Contracts

The FooFactory contract can create new instances of Foo by calling makeNewFoo:

pragma solidity ^0.4.11;contract Foo {
}
contract FooFactory {
address fooInstance;
function makeNewFoo() {
fooInstance = new Foo();
}
}

The full assembly for this contract is In This Gist. The structure of the compiler output is more complicated, because there are two sets of “install time” and “run time” bytecode. It’s organized like this:

FooFactoryDeployCode
FooFactoryContractCode
FooDeployCode
FooContractCode
FooAUXData
FooFactoryAUXData

The FooFactoryContractCode essentially copies the bytecode for Foo in tag_8 then jump back to tag_7 to execute the create instruction.

The create instruction is like the eth_sendtransaction RPC call. It provides a way to create new contracts from within the EVM.

See opCreate for the go-ethereum source code. This instruction calls evm.Create to create a contract:

res, addr, returnGas, suberr := evm.Create(contract, input, gas, value)

We’ve seen evm.Create earlier, but this time the caller is an Smart Contract, not a human.

AUXDATA

If you absolutely must know all about what auxdata is, read Contract Metadata. The gist of it is that auxdata is a hash that you can use to fetch metadata about the deployed contract.

The format of auxdata is:

0xa1 0x65 'b' 'z' 'z' 'r' '0' 0x58 0x20 <32 bytes swarm hash> 0x00 0x29`

Deconstructing the auxdata bytesequence we’ve seen previously:

a1 65
// b z z r 0 (ASCII)
62 7a 7a 72 30
58 20
// 32 bytes hash
62a4d50871818ee0922255f5848ba4c7e4edc9b13c555984b91e7447d3bb0e74
00 29

There you go, one more Ethereum trivia for the party night.

Conclusion

The way contracts are created is similar to how a self-extracting software installer works. When the installer runs, it configures the system environment, then extracts the target program onto the system by reading from its program bundle.

  • There is enforced separation between “install time” and “run time”. No way to run the constructor twice.
  • Smart contracts can use the same process to create other smart contracts.
  • It is easy for a non-Solidity languages to implement.

At first, I found it confusing that different parts of the “smart contract installer” is packed together in the transaction as a data byte string:

{
"data": constructorCode + contractCode + auxdata + constructorData
}

How data should be encoded is not obvious from reading the documentation for eth_sendtransaction. I couldn't figure out how constructor arguments are passed into a transaction until a friend told me that they are ABI-encoded then appended to the end of the bytecode.

An alternative design that would’ve made it clearer is perhaps to send these parts as separate properties in a transaction:

{
// For "install time" bytecode
"constructorCode": ...,
// For "run time" bytecode
"constructorBody": ...,
// For encoding arguments
"data": ...,
}

On more thoughts, though, I think that it’s actually very powerful that the Transaction object is so simple. To a Transaction, data is just a byte string, and it doesn't dictate a language model for how the data should be interpreted. By keeping the Transaction object simple, language implementers have a blank canvas for design and experiments.

Indeed, data could even be interpreted by a different virtual machine in the future.

If you like furry animals, you should follow me on Twitter @hayeah.

In this article series about the EVM I write about:

If you’d like to learn more about the EVM, subscribe to my weekly tutorial:

--

--