Building a Z80 Disassembler in Elixir

Pattern Matching All The Way Down

billperegoy
im-becoming-functional
4 min readNov 21, 2017

--

I’ve been working on code to emulate a TRS-80 computer. The TRS-80 was my first platform and I think it would be fun and informative to be able to play with the internals of that computer. This is possible because the original ROMs that acted as the “operating system” are now available online. The TRS-80 contained a 4k or 16k ROM that contained the Z80 machine code that was used to boot the computer to a prompt that allowed you to write BASIC code.

Processing Binary Files in Elixir

In order to use these ROMs, I need to be able to process these binary ROM files. I decided to do this processing in Elixir as the language has good support for binaries and it’s an area of Elixir I’ve been wanting to learn.

First, let’s read a binary file.

iex> {:ok, data} = File.read("level1.rom")
<<243, 33, 255, 0, 195, ... >>

This reads the binary file and binds the results to rom.

We can confirm that this is the correct size as follows.

iex> byte_size(data)
4096

This returns 4096 which is the expected size of the level 1 ROM. Now that we have that, I’d like to extract the bytes and disassemble them into the Z80 instructions they represent. To do this, we can recursively patten match on a binary. As a starting point, let’s extract the first byte from the rom binary.

iex> <<byte>> <> rest = data
<<243, 33, 255, 0, 195, ... >>
iex> byte
243
iex> rest
<<33, 255, 0, 195, ... >>

So just like we can recursively break down a list with pattern matching, we can do the same with a binary. We can take any binary and return the first byte along with the rest of the bytes. Given this knowledge, we can write a recursive function to print the entire contents.

defmodule Rom do
def fetch(data) do
<<byte>> <> rest = rom
IO.puts byte
if byte_size(rest) > 0 do
fetch(rest)
end
end
end

The conditional at the end makes our code a bit difficult to read so we can simplify by using pattern matching to end the recursion.

defmodule Rom do
def fetch(<<>>), do: nil
def fetch(data) do
<<byte>> <> rest = rom
IO.puts byte
fetch(rest)
end
end

Here, we have simply matched on the empty binary and returned nil. For non- empty binaries, we use the recursive clause.

Pattern Matching on Instructions

Now that we have the individual bytes extracted, we can do even more pattern matching on these bytes to decode Z80 instructions.

Here is an example of a Z80 instruction we’d like to decode. A load instruction is used to move an 8 bit value from one internal register to another internal register. The instruction is encoded as a ‘01’ in the upper two bits along with two 3-bit fields that get further decoded into the source and destination registers. You can see the complete description from the Z80 specification below.

In order to process the extracted byte values, they need to first be converted to binaries. The ROM file we are processing stores each byte as an individual integer. We can convert an integer to an 8-bit binary as follows.

iex> byte = 243
243
iex> binary_byte = <<byte::8>>
<<243>>

Given this knowledge, we can create a new function to process this byte as a binary.

defmodule Rom do
def extract(<<>>), do: nil
def extract(data) do
<<byte>> <> rest = data
process_byte(<<byte::8>>);
extract(rest)
end
def process_byte(byte) do
decode_instruction(byte)
end
def decode_instruction(<<1::size(2)>> <>
<<dest::size(3)>> <>
<<src::size(3)>>) do
IO.puts "Got LD instruction"
end
def decode_instruction(byte) do
IO.inspect byte
end
end

We created a new function named decode_instruction and pattern match using the decode pattern from the Z80 spec. With this code in place, we are either printing the raw byte data or in the case of an instruction we match against, we print information about that specific instruction.

Pattern Matching on the Register Fields

We can further pattern match on the 3-bit register value as follows:

  def decode_register(<<7::size(3)>>), do: "A"
def decode_register(<<0::size(3)>>), do: "B"
def decode_register(<<1::size(3)>>), do: "C"
def decode_register(<<2::size(3)>>), do: "D"
def decode_register(<<3::size(3)>>), do: "E"
def decode_register(<<4::size(3)>>), do: "H"
def decode_register(<<5::size(3)>>), do: "L"
def decode_register(_), do: raise "Invalid register encountered"

This function takes a 3-bit register value and returns a string representing the register name.

Putting it All Together

With this function in place, we can refine the decode_instruction function to decode the specified registers.

  # LD r, r
def decode_instruction(<<1::size(2)>> <>
<<dest::size(3)>> <>
<<src::size(3)>>) do
src_reg = decode_register(<<src::size(3)>>)
dest_reg = decode_register(<<dest::size(3)>>)
IO.puts "LD #{dest_reg},#{src_reg}"
end

This details the approach to decoding a single instruction. For our complete disassembler, we’d create a new function clause for each class of instruction. This would allow a fairly straightforward and modular approach to disassembly without the multitude of conditionals we’d have with most other approaches.

Summary

Normally for a process involving the processing of binaries, I’d reach for a traditional system programming language like C. But it turns out, the powerful pattern matching capabilities in Elixir make it easy to write very concise code to break down binaries into their component parts. In the end, I find the pattern matching approach is much cleaner and easier to understand than the nest of conditionals I’d need to do the same work in a more traditional language.

--

--

billperegoy
im-becoming-functional

Polyglot programmer exploring the possibilities of functional programming.