Compile AVM-Bytecode-Variables

robbie wang

Published in

NewEconoLabs

4 min readNov 8, 2019

This article is written by Li Jianying(Light Li) in Chinese and translated into English by Robbie.

Reference source code: https://github.com/lightszero/neovmbook/tree/master/samples/compiler_il01

Now we discuss how to compile into another AVM from another assembly language. In fact, there is no difference.

Let’s take IL as an example. It is also a stack virtual machine. There are many similarities between IL instruction and AVM instruction.

Unlike AVM, IL still retains the modular structure of function level. IL does not connect to a large byte[], but each function corresponds to a byte[]. The call of IL is still at the function level, and each function corresponds to The IL instruction address starts from zero, and the IL jmp has been converted to the address.

Let us still use this 1+2 example

We compiled the above srccode into a c# dll, compiled in DEBUG mode, and its IL code should look like this.

Let’s explain them one by one.

nop is empty instruction

loc Is the PUSH of NEOVM

stloc puts the value in the variable list

ldloc takes the value from the variable list

add is same as NEOVM’s ADD

br is the JMP of NEOVM

ret is same as NEOVM’s RET

This br instruction is generated by DEBUG mode compilation. The RELEASE mode compilation will be optimized. The RELEASE mode will also generate many other optimization instructions. It is easy to explain. Let’s skip this br jump.

This br jump is a meaningless jump that jumps to the next instruction, ignoring it comes without any side effects.

Well, let’s re-organize it.

PUSH 1

STLOC 0

PUSH 2

STLOC 1

LDLOC 0

LDLOC 1

ADD

STLOC 2

LDLOC 2

RET

Think about the pseudo code that appeared in the previous article.

//int a=1

PUSH 1

STLOC 0

//int b=2

PUSH 2

STLOC 1

//return a+b

LDLOC 0

LDLOC 1

ADD

RET

Well, this is basically the same, isn’t it?

The last STLOC 2 and LDLOC2 are again making a temporary variable that can be eliminated.

The compiler is probably doing this in DEBUG mode.

var c=a+b;

return c;

If you have already read and understood the previous article, you don’t need to continue reading, because this following processing is the second half of the previous article.

The IL code directly applies the concept of the variable list to variables. In the previous article, we discussed how to compile variables and add a list of variables.

IL has this concept, so we translate his code directly, and in the previous article we need to count the number of temporary variables, this time it is not needed, IL has this data

IL variable table types and indexes are both here.

This time the code is in samples/compiler_il01

Translation work becomes very simple, and in most cases, if you have compiled a high-level language to AVM. Then you compile a bytecode in a high-level language to AVM, and the second half of the work is similar.

For most of the code, we just need to deal with them directly; the logic is the same as the previous compilation.

But there is a little trouble in STLOC here

IL instruction is

//IL CODE

LDC.i4.1

STLOC.0

But we expect the result of the translation to be

//AVM

DUPFROMALTSTACK//array

PUSH 0//index

PUSH 1 // LDC.i4.1

SETITEM

The STLOC code needs to wrap the LDC code in the middle, maybe you think about reversing the order, but unfortunately, this will make the problem more complicated.

The meaning of STLOC is to put the value at the top of the calculation stack into the variable list instead of putting the previous instruction into the variable.

LDC.i4.1

LDC.i4.4

ADD

STLOC.0

For example, in this case, the calculation results of the above three instructions are put into the variable list, so it is impossible to change the code order arbitrarily, so how do we deal with it?

//AVM

PUSH 1 //LDC.i4.1

//STLOC.0 begin

DUPFROMALTSTACK//array

PUSH 0//index

PUSH 2

ROLL

SETITEM

//STLOC.0 end

We insert more instructions and have NEOVM adjust the order of the data on the stack. We use PUSH 2, ROLL two instructions to complete the parameter sequence flip on the stack.

For example, the value on the stack from bottom to top is [1, varArray, 0/varindex/]

ROLL 2 can put the value of index 2 from the top of the stack to the top of the stack.

After executing ROLL 2, it is [varArray, 0, 1], which is in line with our expectations.

Compile AVM-Bytecode-Variables

Written by robbie wang