Writing a Game Boy Assembler/Disassembler in Node.js

Syscall59 — Alan Vivona
DailyJS
Published in
5 min readAug 17, 2020
Photo by Ben on Unsplash

The Game Boy uses a 8-bit computer chip similar to an Intel 8080. As it has only 256 instructions it’s an interesting candidate for experiments such as writing an assembler from scratch — and that’s what this article is about.

Here I’ll describe the process I went through when I wrote this Game Boy assembler in Node.js.

In case you want to try it just issue the following:

# option 1: using npm
npm install -g game-boy-assembler
# option 2: using docker
docker run -ti alanvivona/gameboy

Now let’s jump into the process of writing such a thing!

Photo by Spencer on Unsplash

Understanding the architecture

The first thing we need is to analyze the instruction set and write a list of them. I extracted the instructions from this awesome manual and generated a file as the one you can see in the screenshot below.

The following step was to structure the instruction set data.

I choose to do it in the following JSON format so I can include some metadata about each instruction, which gives me some advantages if I try to make the assembler evolve into a disassembler in the future.

This opcodes.json file contains a JSON array with every posible instruction described in the format above. The format includes:

  • Opcode: Raw numeric value the opcode will have in the produced binary
  • Description
  • Mnemonic
  • Full mnemonic: An abstract representation of the syntax the opcode should comply with (i.e: how many operand does it take)
  • Operands: Array describing each operands type (number, address, register, etc) and size.

A second JSON file containing all the possible operands any instruction can have was also created ( operands.json ). This was also useful for performing meaningful checks over the provided input to verify if the type of operands being used were valid or not.

To help with the process of understanding the architecture some analysis functions were written to extract relevant statistics over the complete instruction set. One example of this is the operandsAnalysis function that shows stats like the ones below.

great! only 40 different ways to express operands

In order to make use of all this data an opcode parser was written to check opcode syntax and operands type for each input. Once the standardized ISA data was in place, recognizing patterns in opcode syntax was easy and writing the opcode parser was a smooth process.

Photo by Elias Castillo on Unsplash

The assembler

After having mapped the whole instruction set now it’s time to code the actual assembler. This part is straight forward as the syntax checks were already written and the JSON format in which the instructions were stored had all the metadata needed for generating the output in a simple manner.

The basic logic behind the assembler is the following:

  • Take the input
  • Extract the mnemonic from it
  • Check the mnemonic validity, it’s syntax and run checks on the operands aswell
  • If all correct, generate the instruction raw value by concatenating the mnemonic value and the value of each operand.

The disasembler

For implementing a very naive disassembler we can just do the inverse as with the assembler:

  • Read 1 byte
  • Check if the value corresponds with a mnemonic
  • If it does, pick the remaining bytes (depending on mnemonic)
  • Interpret the remaining bytes as operands, taking into consideration the type.

My implementation of this logic can be seen at disasm.js.

The following is a screenshot of the output of the first version of the disassemblar reading from a Pokemon game image in raw format from the very start of it.

You can see there are some unknown instructions (“UNK [!]”). Those are bytes that the disassembler failed to recognize. This is probably due to the fact that it is trying to interpret every single byte as code but there are data sections in the game file aswell.

As this is not intended to be a production-grade disassembler I left out adding a feature to differenciate code sections from data for now.

Photo by Dimitri Houtteman on Unsplash

The interface and the npm package

The next steps were to implement a command line interface to test the package by comparing the output against something like IDA or radare2 and to build an npm package to publish the project.

I will skip this part here as there are plenty of good tutorials out there on how to create a Node.js CLI and publish an npm package. There’s also nothing special about the CLI of the game-boy-assembler package itself.

Photo by Dan Counsell on Unsplash

If you liked this article don’t hesitate to share it and remember you can follow me on Medium, Twitter, and Github for more!

Twitter: https://twitter.com/syscall59
Medium: https://medium.com/@syscall59
Github: https://github.com/alanvivona

Until next time!

--

--