So You Want to Build a Language VM in Rust— Part 14

Symbol Tables

Fletcher Haynes
Iridium VM
4 min readOct 11, 2018

--

Assembler Struct

Welcome back! When we last left our intrepid readers, we were about to write an assembler struct.

But why? What does it do?

What does it all mean?! Why don’t more people love interrobangs?!

So far, we’ve taken strings and handed them directly to our parser via the program parser. But now we’re talking about doing things that require keeping state like multiple passes. This will serve as our simple abstraction on top of all that. In src/assembler/mod.rs, add this:

Don’t worry about the symbols field. We’ll take care of that soon.

Simple assembler assembled! Note how we created an enum to track its phase. We could have used a u8 or something as well, but this leverages the Rust type system for clarity. Our Assembler is going to take over the following duties:

  1. Passing the raw string to the parser
  2. Constructing the symbol table
  3. Outputting a Vec<u8> that is the final bytecode that the VM can read

Step 1: Parsing

For this, we’re going to add a few functions to our assembler:

Three new functions! What riches!

  1. The assemble function accepts a raw string reference
  2. Assembler gives the raw text to the program parser
  3. It uses a match statement to check that the program parsed correctly
  4. Assuming it did parse, we feed the program through each of the assembler phases
  5. The assembler phases are broken out into other functions to help keep it neat
  6. The first phase extracts all the labels and builds the symbol table
  7. It then switches the phase to second
  8. The second phase is then called, which just calls to_bytes on every AssemblerInstruction
  9. All the bytes are added to a Vec<u8> which contains the fully assembled bytecode

Next up, let’s look at the extract_labels function:

What this function does is go through every instruction and look for label declarations. That is, places where the user has typed some_name: <opcode> …​. When it finds one, it adds it to our symbol table, along with the byte we found the label at.

Part 2: Symbols and Tables

We need to make three more data structures: the Symbol, the SymbolType and the SymbolTable. Put these in src/assembler/mod.rs. Symbol and SymbolType looks like:

Later on, we’ll have more SymbolTypes. For now, we start with Label. SymbolTable looks like:

This would be better implemented as a HashTable. We’ll change it later.

Right now we need basic functions (add, get symbol value), but don’t worry, it will grow. =)

And Yet More Tests

Ha, you thought I’d forgotten, didn’t you? No such luck!

This will take care of the symbol-related tests:

And for the assembler:

We’ll call it good for this part. Wikipedia has a good article on SymbolTables. Don’t worry if they are confusing at first. Next up, we’ll be using clap to make a nicer CLI interface to our VM.

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here.

Originally published at blog.subnetzero.io.

--

--