Published in


Securing Warp: A formal specification of the Yul IR

by Julian Sutherland (JulekSU)

Thanks to Artem Yurchenko, Dominic Henderson and Greg Vardy for reviewing and editing.

As we have covered in previous posts by Greg, Nethermind has been partnering with Starkware to develop Warp, a transpiler from Solidity to Cairo. Using Warp, Ethereum smart contracts can be easily ported to run on StarkNet, allowing them to reap all of the associated benefits. Warp uses the Solidity compiler to generate Yul, an intermediate representation (IR) which it then compiles to Cairo.

In this blog post, we’ll briefly outline why formal verification is a vital part of secure smart contract development. We’ll then given an intuition as to how the structure of a correct Yul program can be encoded into the dependent type system of the Lean proof assistant and leveraged to produce the first mechanisation of the Yul IR’s semantics.

DeFi protocols have a combined $102 billion in TVL (as of Jan 4th, 2022). On top of that, they handle billions of dollars in value transfer a day, so it’s no stretch to say that security is consideration numero uno. Using Warp to transpile security-critical contracts might make some developers uneasy, so being certain that it is semantics preserving, i.e., a transpiled Cairo contract has the same semantics as the original Solidity contract, is extremely important. To increase our degree of certainty, we are currently working towards producing a formal specification of Warp’s compilation process in Lean and proving that it is semantics preserving. From this, we intend to extract a property tester to really put Warp through its paces.

Thanks to the team at Carnegie Mellon University and StarkWare, led by Jeremy Avigad, we already have a formalisation of Cairo in Lean. The next step to formalising Warp’s compilation process was to develop a formal semantics of Yul in Lean.

A small-step semantics for Yul

We produced a formal semantics for Yul in Lean based on the specification of the Yul language. This specification is parametric on the semantics of a set of primitive functions, the choice of which induces several Yul dialects: EVM 1.0, EVM 1.5, and Ewasm dialects are planned. However, as our primary target is the fragment of the EVM dialect of Yul produced by solc given our particular set of optimisations, we make a few simplifying assumptions:

  • All literals are of type uint256.
  • Our formalisation of Yul need not allow arbitrarily nested function definitions as they are all hoisted to the top-level by the “FunctionHoister” optimisation.

The Yul specification defines three primary Yul term types: statements, expressions, and blocks. These are mutually inductively defined. The inductive definition of Yul expressions is as follows:

Inductive definition of Yul expressions.

They can update the current machine state (through function calls) and evaluate to 0 or more literals. Note that both the literal and identifier cases evaluate to precisely one literal. We denote literals and identifiers as follows:

Next, the inductive definition of Yul statements is as follows:

Inductive definition of Yul statements.

They correspond to imperative statements that update the current machine state when executed and can bring new program variables into scope. All of these statements have the usual associated semantics. The second argument of the switch statement encodes an association between literals and blocks, with the third argument serving as an (optional) default.

Finally, a block term is a sequence of statements executed within a nested lexical scope.

The Yul language specification then formalises the operational semantics of Yul terms using big-step semantics. This big-step semantics describes the possible behaviours of Yul terms by relating pairs of initial states and Yul terms to the states resulting from fully evaluating the terms in the initial states. For expressions, the associated big-step semantics also produces a finite vector of literals representing the result of the expression’s evaluation. Note that we are eliding some details for the sake of simplicity; in the complete semantics, the evaluation of a Yul term also produces a mode which we use to implement the behaviour of statements that modify control flow such as break, continue and leave.

Machine states consists of a pair of:

  • A variable store, σ, a partial map from identifiers to literals.
  • An primitive state, μ, on which our primitive functions act. The set of which is denoted by M.

The big-step semantic is also parameterised by a function context, denoted γ, a partial map from identifiers to either a function definition or a primitive function definition.

A function definition is a triple consisting of a list of the identifiers of the function’s arguments, a list of the identifiers of the program variables storing the function’s return values, and a block representing the body of the function. A primitive function definition is triple consisting of two natural numbers, m and n, the arity of the function and its degree (the number of values it returns) respectively, as well as a map implementing the primitive. This implementation maps pairs of a vector of n literals, the functions arguments, and a primitive state to a pair of a vector of m literals, the function’s return values, and an updated primitive state. (The Σ binder here denotes a dependent product where the type of the last element of the tuple depends on the value of the first two, n and m in this case)

We can now define the big-step semantics of Yul expressions:

Big-step semantics of Yul expressions.

A similar definition can be given to Yul statements and blocks; in fact, the big-step semantics used in the non-primitive function rule to evaluate the function’s body is that of Yul blocks! However, any Turing complete language such as Yul will necessarily contain some non-terminating programs. In Yul, this can either be due to unbounded recursion or a for loop that does not terminate. For the sake of soundness, all definitions in Lean 3 must be well-founded, preventing it from directly encoding this big-step semantics.

To circumvent this problem, we express Yul’s operational semantics equivalently using small-step semantics. A small-step semantics relates a pair of a machine state and a Yul term to another, further evaluated, pair of machine state and Yul term. It describes the semantics of Yul terms by repeated evaluations that can either diverge or eventually reduce to some fully evaluated term. In particular, each individual step is guaranteed to terminate, so it can be encoded as a well-founded definition in Lean. A non-terminating Yul program will then be able to take an infinite number of these steps.

It is also the case that there are Yul terms that conform to the inductive definitions we have laid out but cannot be evaluated via the big-step semantics. I.e., a state where no further evaluation rule can apply is reached during any attempt at reduction. Such terms should be rejected by the Warp compiler as their semantics are undefined. For instance, an expression term cannot be evaluated if it references an identifier or function that has not already been declared. If this occurs, in the former case the identifier evaluation rule will not apply given that the program variable referenced, x, is not in the domain of the variable store, σ. In the latter, neither function call rule applies as the function identifier, f, is not in the domain of the function context, γ.

This also occurs in the big-step semantics of Yul statements. To avoid undefined semantics, a Yul statement must obey the following constraints:

  • An assignment or variable declaration cannot be evaluated if the number of literals produced by the associated expression is not the same as the number of variables bound.
  • A standalone expression must reduce to precisely 0 literals, or it cannot be evaluated.
  • A break or continue statement can only be nested within the body of a for loop statement.
  • A leave statement can only be nested within the body of a function.

We will refer to Yul terms satisfying these restrictions as semantically valid, and we formalise them using inductively defined dependent type families. To formalise this, we need to track a few things:

  • The degree of expression terms (the number of literals that an expression evaluates to once fully evaluated).
  • The variables and functions in scope in a given term.
  • The variables a statement brings into scope when executed.
  • Whether a statement is in a for loop.
  • Whether a statement is in a function.

To be able to infer the degree of an expression, we need to know the arity (the number of literals it takes as input) and degree (the number of literals it returns) of any function referenced within it; this is done through a function typing context, denoted Γ:

A function typing context is a partial map from a function name to a pair of its arity, denoted m, and its degree, denoted n. The domain of a function typing context also tracks the set of functions in scope. A set, vars, of identifiers is used to track the program variables in scope and, consequently, is a parameter of the new set of variable stores. Finally, the IsInFor and IsInFunc types are used to track whether a statement or block is in a for loop or in a function (although the details of this will not be presented in this post).

The new set of semantically valid Yul terms can then be formalised using type families parameterised with these arguments as appropriate. Let us consider the type family describing the set of semantically valid Yul expressions; the parameter n denotes the degree of the expression in question:

Inductive definition of valid Yul expressions.

Note that the degree of literal and identifier expressions is one and that any identifier must be in vars, the set of identifiers currently in scope. The expressions parameterising a function call must all evaluate to exactly one literal, and their number must agree with the arity of the function in question, recorded within the function typing context. Similarly, the degree of a function call expression depends on the degree of the function, once again, recorded in the function typing context.

We need to add two new expression constructors (pseudo-expressions, highlighted in red) to implement a small-step semantics for Yul: A scope representing a partially evaluated function call and a result representing a fully evaluated expression of degree n. The scope contains a variable store storing the value of the program variables vars’ required to evaluate the body of the function (which is of type Block Γ vars’, the type family of semantically valid Yul blocks, which we do not define in this post). The rets argument is a vector of identifiers, whose length is the same as the degree of the expression and whose values will be returned if the evaluation of the block terminates. In particular, all of the identifiers in rets must be in vars’, so that their associated values can be fetched from the nested variable store. From now on, we will implicitly allow vectors to be cast to a set (this set being the union of the singletons containing each element of the vector) in definitions when convenient.

Similar dependent type families can be used to define the set of valid Yul statements and blocks, respectively. The full details of the Lean implementations of these dependent inductive type families can be found here.

Like our big-step semantic, our small-step semantic is also parameterised with a function context, γ; however, we want to require that it agree with our function typing context, Γ, on the arity and degree of each function. For this purpose, we redefine our function context type, passing Γ as a parameter:

A function context then maps an identifier in the domain of Γ, our function typing context, to either a function definition or a primitive function definition that agrees with Γ on the arity and degree of the function.

We can now give the small-step semantics of valid Yul expressions:

Small-step semantics of Yul expressions.

Once again, we elide the modes to simplify our exposition. The type of the small-step semantics then restricts us to only being able to run a step of a Yul expression of type Expr Γ vars n with a variable store σ of type VarStore vars. Similarly, the function context, γ, must in FTContext Γ. Given these restrictions, every semantically valid Yul expression, excluding results, can always be further evaluated to a Yul expression of the same type: assuming the same variables and functions are in scope and preserving the same degree. Repeating this process either eventually yields a result or it can, due to a non-terminating function call, diverge.

The evaluation rules for literals and identifiers clearly preserve the degree of the expressions they apply to, as they both produce result vectors of length 1. Given the restriction that a program variable, x, must be in vars, the small-step semantics’ rule for evaluating identifiers must apply when given an appropriate σ in VarStore vars, as σ(x) is necessarily defined. Given that the small-step semantics’ rule for evaluating literals has no conditions, it can also always apply to a literal.

The following rule in this figure reduces the rightmost argument of a function that has not yet been fully evaluated to a result. Note that, given the type of the parameterising expressions of function calls, and the fact that we can inductively assume that our small-step semantics preserve the type of these expressions, including their degree, they can only fully reduce to result vectors of length 1. This rule will apply to any function call expression whose arguments are not all already fully evaluated.

Once this occurs, one of the following two rules apply depending on whether the function being applied, f, is primitive or not. In the latter case, the first of the two rules applies. From the restriction on function calls, we know that if Γ(f) = (n,m), the call must have m parameterising expressions, and the arity of the call must be n. Then, given that the parameterising function context, γ is in FTContext Γ, we know that γ(f) = ([x₀, …, xₘ], [r₀, …, rₙ], B), where B is a block of type Block Γ {x₀, …, xₘ,r₀, …, rₙ}. We can then run the block B in the context of the variable store σ’ defined over {x₀, …, xₘ,r₀, …, rₙ}, setting x₀, …, xₘ to the results of fully reducing the m parameterising expressions of the function call and setting the n return variables, r₀, …, rₙ, to the default value, 0, as required by the Yul specification. The Block B is then nested in a scope pseudo-expression along with σ’ and the list of return variables, r₀, …, rₙ, that will be needed if the evaluation of the body terminates, to fetch the appropriate return values.

For the second of these two rules, when the function applied is primitive, the same invariants apply; however, there is no need for the use of a scope as the result is immediately computed.

The second to last rule then fetches the return values r₀, …, rₙ from the inner variable store of a scope expression once the nested block is completely evaluated. It then stores them in a result vector of length n, preserving the degree of the expression as required. Here, empBlock, is a pseudo-block constructor, added to describe a fully evaluated block in the small-step semantics of Yul.

Finally, the last rule reduces a block nested in a scope expression with the Yul block small-step semantics, which we elide here for the sake of brevity. The steps of the Yul block semantics never changes the set of variables the block requires to be in scope and the domain of the resultant variable store also stays the same, allowing the small-step semantics of blocks to be repeatedly applied until it (potentially) terminates. This also guarantees that r₀, …, rₙ will be defined in any variable store resulting from a full evaluation of the nested block.

This gives us an encoding of the semantics of Yul expressions using dependent types. The Lean formalisation of this small-step semantics can be found here, implemented by the function evalCExpr, along with similar functions for valid Yul statements, evalCStatement, and valid Yul blocks, evalCBlock. There is also another Yul pseudo-term type, called a BlockList, introduced because of technical restrictions in Lean and the need to introduce a nested scope within a block. It also has an associated evalBlockList function specifying its small-step semantics.

Altogether, this gives us a complete formalisation in Lean of all possible behaviours of any semantically valid Yul term! (modulo the simplifying assumptions stated above)

That’s great! But what can I do with all of this?

Good question! Once an appropriate parameter is created, formalising the EVM/Ewasm primitives as appropriate, this formalisation of the Yul semantics has several uses:

  • It can be used to prove properties of specific Yul programs in Lean.
  • It can be used to formalise compilation processes to and from Yul in Lean. This process can then be proven to be semantics preserving.
  • It can be used, in conjunction with the slim check library, to property test Yul compilers given a formalisation of the compilation process. This would require an appropriate instance of the testable typeclass to be written for the YulTerm type family.

Altogether, this can be applied in many different ways to prove a wide array of different properties, bringing us closer to a provably secure Ethereum ecosystem!

For further adventures in formal verification at Nethermind, follow me here or on Twitter!

We’ve got a lot going on at Nethermind, and we’re always on the lookout for talent. If you’re interested in working at the bleeding edge of crypto, drop an email with your CV to us at or send me a DM on Twitter!



Nethermind is a blockchain research and software engineering company. We share technical tutorials and informative posts on #ethereum, #layer2 solutions, #StarkNet, #DeFi and #Web3

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store