So You Want to Build a Language VM in Rust — Part 09

Assembler 2: Cruise Control

Fletcher Haynes
Iridium VM
5 min readAug 23, 2018

--

Megazord…ACTIVATE!!!

We’ve written basic parsers. Now we can take a step up on the abstraction ladder and create a parser that combines some of our smaller parsers. Right now, we can recognize one opcode, registers and integer operands. We can group these into an AssemblerInstruction.

In src/assembler/instruction_parsers.rs, put:

And now, the parser for the instruction itself…

See how we are using the parsers we defined? Collectively, they make up one AssemblerInstruction. We leave the operand fields as Optional, to allow greater flexibility.

Note:

You may be wondering about the pub in front of the parser name, such as: pub instruction_one. This makes the function generated by the nom macro public so we can access it from other modules. Our Program parser will need to access the instruction_one parser from its module.

Now a test. Put this at the bottom of src/assembler/instruction_parsers.rs:

And now our final parser, the Program parser. A Program consists of Instructions. Make src/assembler/program_parsers.rs, and in it put:

We now have a struct that contains a vector of assembler instructions. Next step is to give AssemblerInstructions the ability to write themselves out as Vec<u8>s. Then we just have to iterate through the instructions vec and done!

But first…

ANOTHER TEST!

The instructions field of p (which is a Program struct) is private. I’m not sure if it is better to make them public, make an accessor function, or what. Let’s revisit this later.

Getting at the Bits

We need each AssemblerInstruction to have a function we can call to get a Vec<u8>. Let’s head over to instruction_parser.rs and add one.

This is where implementing impl From<u8> for Opcode { over in src/instruction.rs pays off. If you derive Copy and Clone on the Opcode enum, then we can convert any opcode into its integer with code as u8. All this function does is write the opcode bit to a vector, then uses a helper function to extract the operands for any of the operand fields that are not None.

That helper function also goes in impl AssemblerInstruction and looks like:

I thought for sure borrowck was going to chastise me, but passing the results vector around worked like I thought it would.

What extract_operand does is check for the operand type, converts it to bytes and then stuffs them in the results vector.

Note:
You may wonder why we order them this way:
results.push(byte2 as u8);
results.push(byte1 as u8);
and not:
results.push(byte1 as u8);
results.push(byte2 as u8);

This is because they need to be in the proper order according to our big endian/little endian rule.

Back to the Program

Let’s go back to program_parsers.rs and add a function to convert the entire vector of AssemblerInstruction to bytes:

And a test…

Modifying the REPL

Almost done! Right now, our REPL still speaks hex. Head over tosrc/repl/mod.rs and in the catch-all match arm of the function run, put:

Now, if you do cargo run and type in load $0 #100:

A Wild Bug Appears!

Try entering `LOAD $0 #100`. You should get:

Our assembler is case-sensitive! I’m going to leave it as an exercise for the reader to figure out how to fix it. If you get stuck, you can check out the code in https://gitlab.com/subnetzero/iridium

Hex code

At this point, we could delete the parse_hex function, or we can leave it in case someone’s idea of a good time on a Friday night is to code in hex. Some options on what to do with it are:

  1. The REPL could try both and go with whichever parser doesn’t return an Error
  2. The REPL could look for input prefaced with 0x and use parse_hex for that input
  3. We could add a command to our REPL to let it switch input modes. In one, it accepts hex. In the other, assembly code.

End

Yay, we now have a basic, but functional, assembler. In the https://blog.subnetzero.io/post/building-language-vm-part-10/, we’ll teach our assembler how to recognize more opcodes and instruction forms, and how to provide helpful hints to the user when they type something incorrectly. See you then!

--

--