Where does AVM come from — process description
Here are two ways to generate AVM.
1. Using an assembler helper class, synthesizing AVM directly in code through assembly code
2. Use the compiler to get AVM from a high-level language
Assembler
I need to talk about the compilation here.
This involves several concepts, assembly language, machine language, assembler
Interestingly, you don’t need to learn the assembly for this purpose.
NOP
PUSH 1
PUSH 2
ADD
RET
Remember the code in the previous article?
This is the kind of thing that we can call assembly language.
The content input by NEOVM is the machine language that NEOVM simulates, which is in byte[] format.
So the above five instructions are machine languages that can be turned into a byte[] that NEOVM recognizes.
The tool that does this is called the assembler.
Let’s make an assembler.
ScriptBuilder.cs in NEOVM does most of the work of the assembler, except for linking.
Linking is more complicated, and it is also a focus of the assembler. This requires us to have more links to virtual machines such as NEOVM to continue our discussion. Let us focus on turning the five assembly instructions into byte[].
There used to be an official assembler project in NEO (neoa, which has been in disrepair https://github.com/neo-project/neo-compiler/tree/master/neoa
To study the compiler, the assembler is also a prerequisite. Maybe I will maintain a new assembler project)
Call ScriptBuilder to generate NEO machine code (AVM)
Let’s get to the code directly. This program is located in samples/neovm01
Note that the reference to Neo3.0 NeoVM, this series of articles are only for NeoVM3.0, you don’t have to worry, in fact, NeoVM3.0 is not so different.
Introduce Neo.VM from nuget
Then use the ScriptBuilder to directly complete the work of the assembler, we can get
machinecode=0x6151529366
Then we have neovm do this
We can also get retvalue=3
Ok, here we know that the .avm machine language is compiled from the assembly language by the assembler. Although we didn’t talk about the link, the problem is more complicated, and we will discuss it in the future.
Where does the assembly language come from?
Then here is a problem. You can’t always compose the assembly. Here is a concept of a compiler.
We need a tool to translate
“1+2”
into
NOP
PUSH 1
PUSH 2
ADD
RET
This is the job of the compiler.
Ok, we’re going to focus on how to implement this automatic compilation of addition operations into assembly language.
Generate NEO machine code (AVM) with the compiler
Let’s get to the code directly. This program is located in samples/neovm02
There is an extremely simple compiler here.
It can only compile the addition of positive integers, such as
“1+2+4+5”
The source code of this simple compiler is divided into two parts; one part is to sort the source code into an abstract syntax tree, that is, AST, which is the ParseSynatxNode function.
Then we get the abstract syntax tree of the expression “1+2+3+4”
The next step is to turn the abstract syntax tree into the code we actually want to execute.
It’s also very simple, call the assembler, traverse the syntax tree deeply and get the machine code
Then I will execute this code with neovm and get the result 12
Process analysis
As all the code is here, let’s analyze it.
There should be several processes along the way; they are often referred to as compilers in general.
Word Segmentation -> Create Abstract Syntax Tree -> Convert to Assembly Code -> Convert to Machine Code
- Word segmentation
Word segmentation is the first job of the compiler
For “1+2323+4”, the compiler can’t always analyze it one byte by one byte. First, split the string into each word “1”, “+”, “2323”, “+”,” 4"
Because our test compiler is very simple, string.split can finish this.
2. Establish an abstract syntax tree
Then do the parsing, the most common form of organization is to generate an abstract syntax tree.
“1+2+4+5” is organized into a tree
The top node is an addition node, the left value is “1+2+4”, and the right value is 5
“1+2+4” is split into an addition node, the left value is “1+2”, and the right value is 4
“1+2” is split into an addition node, the left value is 1, and the right value is 2
Some scripting language compilers only do this, and the abstract syntax tree is built, so that it can be interpreted and executed.
For example, common algorithms used in four arithmetic operations and string calculations: prefix expressions. In fact, the prefix expression is an ast (abstract syntax tree) and then evaluate by deep traversing the tree node.
3. Convert to assembly code
We still look here
We traverse the tree in-depth, the deepest node is 1 2
PUSH 1
PUSH 2
Then the addition of the upper layer
ADD
4 on the same level
PUSH 4
The addition of the upper layer
ADD
PUSH 5
The addition of the upper layer
ADD
Organize them
PUSH 1
PUSH 2
ADD
PUSH 4
ADD
PUSH 5
ADD
Compare it with the code of EmitCode
We directly use the assembler to turn PUSH ADD into machine code.
But think about if we save the instructions first?
You can get
PUSH 1
PUSH 2
ADD
PUSH 4
ADD
PUSH 5
ADD
4. Convert to machine code
According to the strict process, the output of step 3 should be the assembly.
PUSH 1
PUSH 2
ADD
PUSH 4
ADD
PUSH 5
ADD
Then use the assembler to turn it into an AVM
But when writing this article, our independent assembler project has not yet been completed, so we only introduced the simple assembly assistant using SricptBuilder
Usually, the compiler’s own assembler is not called an assembler, but called a linker, because it is mainly responsible for the task of address translation. Friends who are familiar with C++ must understand that the C++ compiler is clearly divided into two processes: compiler and link.
This article is mainly to explain the generation process of AVM, and does not go into the details of address translation.