There are two different cases for Truebit execution:
- Running the program to get the result. In this case it doesn’t matter how the result is got as long as it is always deterministic.
- Getting an intermediate state of a computation for the verification game. The intermediate states have to be specified carefully.
Clearly it is important to be able to run the program efficiently, so that the computation can be finalized on the blockchain in a reasonable time. The case of verification game is different, because it should be rare, and in any case the task can be resubmitted so that it can be finalized faster. However, there are still advantages for faster execution: it would be cheaper to the participants of the verification game and it would also make testing easier, thus increasing the confidence in the security of the contracts that use Truebit.
The main overhead in interpreting Truebit VM code is that on each step, all the stuff like stack have to be updated in the memory. This overhead is quite high because WebAssembly instructions are low level and map well to processor architectures (unlike for example EVM). In fact WebAssembly is clearly designed to run on a JIT (just in time compiler). Simple benchmarks running a factorial loop show that JIT is 100x faster than WASM interpreters, but with more intensive computation the speed up might be higher. (Perhaps with memory intensive computation, interpreter will have lower overhead.)
Running Truebit tasks using JIT
Using JIT to just run tasks is easier than generating intermediate states, but there are still some issues:
- Floating point support and non-determinism.
- Metering: the execution time of the task must be limited for Truebit (see below).
- Stack size: also call stack size has to be limited.
For example if we would have a step limit in the interpreter, the result from the interpreter might be that it reached the step limit, but the JIT has no such concept, and will just output a result. This is the problem that metering solves.
Ethereum does not have floating point support, so the floating point instructions must be disabled during the verification game. In practice, the floating point instructions can just be converted into traps (error instructions that stop the execution immediately). Truebit can still use floating points with an “emulator”, but it converts them to integer operations in WASM.
There is an additional problem, the bit value of NaN might be non-deterministic. To handle this, it’s possible to canonize the value of NaN by inserting checks after each instruction that might generate NaN.
The JIT might have some limits on how large the stack can be.
Stack frames for function calls have the following parts:
- Return address: the point of code where the function returns.
- Arguments to the function.
- Local variables.
- Expression stack size. WebAssembly stack is statically typed, so it is easy to calculate maximum size of the expression stack for a function.
Consider the following WebAssembly function:
(local i64 i64)
(set_local 1 (get_local 0))
(set_local 2 (i64.mul (get_local 0) (get_local 1)))
The local variable 0 refers to the parameter, and local variables 1 and 2 are the actual locals. Just before i64.mul instruction, the stack frame of the function looks something like this:
By instrumenting the WASM code, we can limit the number of stack frames, and also the total size of all stack frames.
Instrumentation is a transformation on byte code, where we can for example insert extra instructions to the beginning of each function. In our case, each function will have a call at the beginning that tells it’s stack size. To be sure that the limit is not breached, the trap is executed if there is no space left for the maximum stack frame size. This ensures that we still have enough space to call any function.
The execution time for Truebit tasks has to be limited, because otherwise a task-giver can give a task that will take too long to execute, which would compromise the security, because verifiers would not have enough time to react to wrong results.
Metering is mostly handled by eWASM. With eWASM metering we can limit the execution time, and thus be sure that both the interpreter and the JIT will behave in the exact same way.
To use JIT for getting intermediate states, we define a set of breakpoints in the code, where the JIT is able to generate an intermediate state. The interpreter can take an intermediate state, and interpret until the next breakpoint. The breakpoints must be inserted so that the number of steps between breakpoints is bounded. We select the following breakpoints:
- Start of each loop round.
- Before each function call.
- After each function call.
Clearly then the number of steps between breakpoints is bounded by the program size.
When a breakpoint is executed, a counter is incremented. When the counter has been incremented to the desired step, the intermediate state should be printed.
The state has the following components:
- Linear memory (RAM). This can be easily accessed at each point of execution.
- Stack frames: The stack frames must be stored using instrumentation.
Global variables are not needed for Truebit, so they can be converted into linear memory accesses. Also the call tables (needed for indirect calls) stay constant during the execution.
Before a breakpoint, arguments and local variables are added to stack. If the breakpoint is a function call, the arguments to this call can be added to the stack at the beginning of the next function. The return address can be hard coded for each function call. After a function call, the return value can be added to stack.
A problem still remains, because the expression stack might have some elements that were not accounted for yet. For each of these hidden variables, we can calculate the instruction that generates this value. (If an if-then-else branch or other block has generated the value, the instruction will be the block end instruction. Perhaps it is better to say that an expression generates a value to the stack). We can add a new local variable for each such expression, and then copy the value of that expression into this new variable just after it was added to the expression stack.
Now we can associate each breakpoint with the list of local variables that correspond to the hidden variables at that breakpoint. The end result is that the hidden variables can be reconstructed from the local variables that are in the stack.
For example the following code will have a hidden variable during the first call of $f:
(call $f (call $g (get_local 0))
(call $f (get_local 0) (get_local 0)))
It can be converted to this:
(call $f (tee_local 1 (call $g (get_local 0)))
(call $f (get_local 0) (get_local 0)))
The hidden variable is copied into a new local variable (at index 1) using tee_local (note that tee_local doesn’t pop the value from stack). The associated list of local variables for the breakpoint before the first call of $f is then .
Handling breakpoints in the interpreter
The way the code is interpreted has to be changed so that the stack at breakpoints will be exactly the same as in the instrumented code. This means that before we call a function we have to remove the local variables and after the call we have to add them. In addition, at the loop start we need to remove and add the hidden variables to the expression stack. (Perhaps a more simple possibility is to add the hidden variables to the stack on the JIT side).
Each breakpoint will have associated entry PC (program counter). So for an intermediate state for a breakpoint, we first calculate the entry PC, and then interpret from there until we get to the next breakpoint. The result of this should be the same as the printed next intermediate state for JIT.
Perhaps it will be too slow to keep track of the stack all the time. This can be made faster by first calculating the critical path. This is the sequence of active function calls at the selected breakpoint. Using the critical path, we only have to add the needed elements to the stack.
Running tasks with JIT should be possible in the next release of Truebit. Generating the intermediate states is still work in progress, and hopefully soon there will be a prototype that is able to dump the stack.