Fun at work: compiling solidity in elm

I’ve been working on a venture recently which has an ethereum component. (I hope to return to some more fun writing when my workload decreases). Dissatisfied with existing environments for working with ethereum contracts, I realized that to get things done, I needed a better, faster TDD environment. Examining the options, and after finding https://github.com/solidityj/solidity-antlr4 , I decided to use a couple of train rides and about a week of late nights to try writing a fast compiler for solidity that could be interacted with easily, produce comprehensible, debuggable output, and interact directly with javascript, allowing mock contracts to be used.

I’m not yet able to release it, also, it lacks some features I’m not using from solidity, but I want to talk a bit about why I chose elm, how the end result turned out, and what I learned:

First, an overview:

383 solint/src/Codec.elm 
364 solint/src/Compiler.elm
35 solint/src/Main.elm
7 solint/src/Ports.elm
70 solint/src/Solidity.elm
22 solint/src/TestData.elm
263 solint/src/TypeInf.elm
145 solint/src/Types.elm
8 solint/src/Util.elm
1297 total

It turned out pretty small, which is nice.

The most interesting parts are Compiler.elm and TypeInf.elm. TypeInf does type determination for each solidity expression, as well as synthesizing struct types that can be passably used with contract instances. Since this is targeting javascript and targeting the ideal semantics of the solidity language, struct objects and objects of contract derived types work similarly. Compiler synthesizes the result javascript in an extremely simple and dumb way. No optimizations are performed, which is one reason it’s fast compared to running “truffle compile”. The result is a single commonjs javascript module with an export for each contract encountered during compilation. This works well because my test rig can import it easily and run contract instances through a bunch of different usage scenarios without the heavyweight dependency of a testrpc instance and a complete blockchain implementation.

There is a simple, light, in memory implementation of the contract-observable blockchain in the small runtime library that goes with this environment, just enough to examine block.number, block.timestamp and block.blockhash. “msg” is an assumed parameter to every contract method, and the outermost invocation scope causes simulated ether to be transferred when msg.value is nonzero on entry. Any time send accessed from a contract object or value from a method, these change the msg that will be sent down so that ether will be transferred again if possible.

Since like C and C++, solidity is a language of pointers and storage, this implementation uses storage cell objects to store data, allowing storage references in solidity to bind to the same storage cell in javascript. In solidity, an initialization from a storage cell to a memory ref copies to fresh memory space, but an initialization from a storage cell to a storage ref copies the address. Also because aliasing is a thing in solidity, there’s a set method on storage cell that deep copies structures if needed. Non-initializating assignments are always done via set so that aliasing is maintained.

After writing the initial data structure builder in java, I thought I might write the compilation part in frege, a haskell for the JVM. I’ve experimented with it a bit, and for some things it’s pretty good. After looking at the type class based json serializer in frege, and after having written my own json serialization facilities in F# based on elm’s decoder design, I realized how much more work it’d be to parse my weird data output in frege. I toyed with the idea of writing small wrapper classes and emitting those from compilation instead but using a plain json structure felt right. It was nice to be able to examine exactly what the structure of the program looked like from the compiler’s perspective whenever there was a problem. I often use F# in javascript via the fable compiler when I want to bang out something I can run in node, but realized that I didn’t need any imperative features or IO at all, and probably wouldn’t be using any of F#’s polymorphic features; the compiler in this case is a one input, one output affair, building a single data structure and outputting a string, that’s perfect for interacting with via a port.

A lesson here is that elm’s json decoding is just about as good as it gets when working with arbitrary json in a strict language:

Here’s a snippet of code that decodes an array declaration with no index, or with an index.

, JD.map3 (\_ base indx -> Array base (Just indx)) 
(JD.field “type” (jdIsValue “array” JD.string))
(JD.field “base” self)
(JD.field “indx” expression)
, JD.map2 (\_ base -> Array base Nothing)
(JD.field “type” (jdIsValue “array” JD.string))
(JD.field “base” self)

Note that I’ve wrapped JD.string in jdIsValue:

jdIsValue : a -> JD.Decoder a -> JD.Decoder a 
jdIsValue v d =
d |> JD.andThen
(\w ->
if w == v then
JD.succeed v
else
JD.fail “no match”
)

Given a decoder, return a decoder that succeeds only if the value was the one we wanted. You can use this technique with JD.oneOf to decode one of a bunch of labeled structures:

, JD.map3 (\_ fromType toType -> Mapping { fromType = fromType, toType = toType }) 
(JD.field “type” (jdIsValue “mapping” JD.string))
(JD.field “from” (JD.string |> (JD.andThen (basicType >> jdResult)) |> JD.map Basic))
(JD.field “to” self)
, JD.map3 (always Struct)
(JD.field “type” (jdIsValue “struct” JD.string))
(JD.field “name” JD.string)
(JD.field “members” (JD.dict (JD.field “kind” self)))
]

Determining the types each expression isn’t very complicated in solidity. Functions are kind of neat, because I hypothesized a “tuple” type for the type of grouped return values. The whole program’s parse is available at this point, and Program has a Dict for declarations currently in force. The Program that gets passed down to statements in a function, therefore, also knows what declarations are in scope as opposed to ones accessed on the ‘this’ object. Almost every function in TypeInf and Compiler takes the full program and the current contract as arguments as the name of a contract can be used either as the name of an aggregate value or as a type for coercion. Types other than basic ones and the usual aggregates are modeled as TypeLookup instances. A lookupType function, given access to the contract and the program looks up the definite type of these. Solidity is thankfully simple in this way.

typeOfExpression : Program -> Contract -> Expr -> RawType 
typeOfExpression p c e =
case e of
-- ...
FunCall e _ -> 
case typeOfExpression p c e of
Function f ->
case f.returns of
[] -> Tuple []
hd :: [] -> Tuple.second hd
hd :: tl -> Tuple (List.map Tuple.second f.returns)
Basic a -> Basic a -- basic type coercion
_ -> Debug.crash "call of non-function"

I didn’t feel bad about using Debug.crash pretty liberally, but I will switch to making these pass through results eventually:

Collecting declarations is easy:

collectDeclarations p c declarations s = 
case s of
Block (hd :: tl) ->
let decl = collectDeclarations p c declarations hd in
collectDeclarations p c decl (Block tl)
Block [] ->
declarations
IfStmt cond thn (Just els) ->
let afterThen = collectDeclarations p c declarations thn in
collectDeclarations p c afterThen els
-- ...
Simple (InitDecl vals expr) ->
let resultType = typeOfExpression p c expr in
case (vals,resultType) of
(vals,Tuple resultType) ->
let vt = List.map2 (,) vals resultType in
List.foldr
(\(n,d) decls -> Dict.insert n d decls)
declarations
vt
([val],resultType) ->
Dict.insert val resultType declarations
(v,r) ->
Debug.crash ("Wrong association between " ++ (toString v) ++ " and " ++ (toString r))
-- ...

Solidity allows multiple initializations from a tuple return and assigns each binding the type of the corresponding tuple slot, but there’s no tuple type in solidity and a tuple can’t be stored in a single variable.

Supporting both tuples and named return values turned out to be interesting:

I named all unnamed returns like $$ret<n>:

BanyanAsset.prototype.currentBlock = function (msg) { 
/* Dict.fromList [(“$$ret0”,Basic (UIntTy 256)),(“$$ret1”,Basic (UIntTy 256))] */
var $$self = this;

msg = msg === undefined ? new message() : msg;
block = msg.block;
var $$ret0 = storageWord(“0”, “uint256”);

var $$ret1 = storageWord(“0”, “uint256”);

this.withScope(function () {
$$self.addEther(msg);
{
return (function (t) {
$$ret0 = (t.index(storageWord(“0”)));
$$ret1 = (t.index(storageWord(“1”)));
}($$self.storageTuple({
type: “tuple”,
kinds: [ “uint256”, “uint256” ]
}, (block.member(“number”, $$self)), (block.member(“timestamp”, $$self)))));
}
});

return $$self.storageTuple({
type: “tuple”,
kinds: [ “uint256”, “uint256” ]
}, $$ret0, $$ret1);
}

And in that way, although clunky, this satisfies everything functions need to do with return values:

return without a value copies the assigned values of named returns to their corresponding results, return with a value, copies the provided values to the return values, regardless of their names, and the return will appear as a single value if single valued and tuplized if it is a tuple.

withScope is an interesting method, since a lot of magic happens there. This inner scope is needed because solidity on the blockchain requires that a function that throws an exception has all changes to state reverted. This I supported (with the help of the previously mentioned set method on storage cells) by having all active code happen in an inner scope. A revert list contains javascript object references and keys that must be changed back to re-create the previous state. When played back in reverse order, we can fully unwind any changes made to contract storage as well as balance changes :-), then throw the original exception from the first scope. The nice thing is that I can write tests like this:

try { 
ba.changeOwnerVote(sw(‘0’), bob, sw(‘100’), new message({sender:owner}));
throw new Error(“We shouldn’t be able to change ownership without second vote when there are two owners”);
} catch (e) {
console.log(“[ok] we couldn’t change owners without two votes”);
}

Before this exercise, I was producing solidity code much more slowly than I should have been able to due to the cost of simulating the life time of a contract over a lot of transactions and a lot of real time. Now I can hack time!

console.log(‘hacking time!’);
var blockAndTime = ba.currentBlock();
var newTime = blockAndTime[1].add(days(5));
solint.hacktime(newTime);
try {
ba.voteAgainstBill(sw(‘0’), new message({sender: owner}));
throw new Error(“We voted against a locked-in bill”);
} catch (e) {
console.log(“[ok] we shouldn’t be able to veto a locked-in bill”);
}
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.