Where does AVM come from — modularization and assembler

robbie wang
NewEconoLabs
Published in
3 min readOct 30, 2019

Reference source code https://github.com/lightszero/neovmbook/tree/master/samples/neovm02

I’ve been talking about the assembler before, and in the process description, we didn’t seriously discuss this issue.

Modularization

We know that NEOVM is an implementation of the Turing machine, but the Turing machine is a tape drive, and the standard tape drive is not modular.

You think about listening to music with the old-style tape drive, can you jump to the next song with one click?

Organizing data in the unit of songs is modularization. CD can do that, but tape can’t.

But in software engineering practice, the first important issue is modularity.

High-level languages are, of course, modular, and functions are the most popular modular units. Later, with the popularity of oop, classes were created.

But before the high-level languages were popular, software engineers worked modularly.

How to implement modularity in machine language

In the machine, there is only one instruction area, and the memory area defines the modularity in the instruction area. If you have studied the oldest Basic language, there is only one code file, and there is no function support. We will adopt

go to [linenum]

We achieve modularity in this way: Different parts of the code achieve different functions

Let’s put this issue into NEOVM. We use the JMP instruction and the CALL instruction to realize the modularization of the code.

CALL instructions

Let’s consider a piece of AVM code

0x00 PUSH 1

0x01 PUSH 2

0x03 CALL +4

0x06 RET

0x07 ADD

0x08 RET

In fact, it is divided into two modules. 0x000x08 is the ADD module.

In the absence of modular tools, engineers must plan how the modules are divided in memory. This is a very tedious task. Software engineering is only possible with modularization.

Linker

Since modularity is so important; it is natural to have a modular aid.

Now we have an assembler project.

https://github.com/neo-project/neo-vm/tree/Branch_neoasm/src/neo-asm

If we use the ASML language that we defined to express it with modularity, it is

Main()

{

PUSH 1//push 1 number

PUSH 2

CALL method1

RET;

}

method1()

{

ADD

RET

}

Engineers think and write code one modular after another, instead of considering which memory block is in which module.

The work of considering module and address translation relationships is often called a link.

For example, the C++ language has a very clear and independent link process.

The CALL instruction is used to do function-level modularity

JMP instructions are used to do modularization inside functions

Our assembler has the function of a linker that automatically connects the two modules, assigns them the appropriate address segments, and lets the CALL parameters automatically point to where they are supposed to point.

Now it’s assembly; the next step is high-level language, this process is the same.

The final job of the compiler is the address translation, which involves assigning an address area to the module and providing the correct address to the CALL instruction to generate the final AVM byte[].

Because our assembler has modular and Linker work, then we explain that the compilation process becomes two parts.

High-level language->AVML->byte[]

Or other virtual machines intermediate language such as IL->AVML->byte[]

No more details on how other compilers handle Linker's work.

JMP Instruction

Having said the CALL instruction, let me talk about the JMP instruction.

Think about such code

int a=1;

if(a)

{

//aaa

}

else

{

//bbb

}

aaa and bbb are submodules inside the two functions.

If there is no modular expression, that’s it; we still have to deal with the address.

0x00 PUSH 1

0x01 JMPIF +3

0x02 PUSH 1

0x03 RET

0x04 PUSH 8

0x05 RET

If we use the modular ASML we defined to represent

Main()

{

PUSH 1

JMPIF label1

PUSH 1

RET

label1:

PUSH 8

RET

}

Don’t care about the address. I introduced a label as a jump location.

80% of the work of high-level language’s conversion process to assembly language is the process of various loops becoming JMP.

Ignore the address translation work of JMP and CALL instructions. This work is left to Linker. In the next article, we will discuss how high-level languages are compiled into NEOVM instructions.

--

--