Rust adventure to develop a Game Boy emulator — Part 2: CPU Registers & Macros

Published in

CodeX

13 min readSep 17, 2024

Hey all, little devs wolfie🐺! In the last chapter, I gave you a first look at how a basic memory for a Game Boy emulation should work, reading ‘n writing bytes. Today we need to implement the heart of any CPU, its register to contain and manipulate data.

Game Boy with CPU DMG on LCD Screen — Game Boy CPU LCD

A closer look at the Game Boy CPU

Our CPU waiting for code emulation is hardware based on a variant of the great champion Z80, a Zilog processor that is part of history. The name on the processor plate we refer to is “DMG” (“Dot Matrix Graphic”, or this is what I found it stands for), sometimes called GBZ80, and works with a frequency of 4.196 Mhz. Before continuing with its characteristics, it’s better to get a general schema of it:

Official Game Boy DMG CPU Schema — Official Documentation DMG CPU Schema¹

Mmh… this could be a bit too much for a first approach, maybe we need to focus on the CPU Core and its registers, which we’ll implement today:

CPU Core schema — CPU Schema of 16-BIT and 8-BIT Registers

In our core, we can find six 16-bit registers, four of which can be divided into two 8-bit registers, named with the half name representing them. For example, the BC 16-bit register is composed of two 8-bit registers B and C.
PC, SP, and F registers are special ones, but we dig into them in a few seconds.

The CPU Registers

Let’s talk about our registers in a list-like view:

PC: Program Counter. This 16-bit register contains the address of the next program instruction to fetch & decode.
SP: Stack Pointer. This 16-bit register points to the address of the memory stack to use. It follows, with special instructions POP & PUSH, LIFO order, as any standard stack structure². LIFO means “Last In, First Out”, you can think of it like a stack of dishes: you can’t remove the bottom without first removing all the above dishes.
AF, BC, DE, HL: General Purpose registers. These 16-bit registers are used for general purposes. Some specific assembly instructions are optimized for operations with HL registers. As these registers are composed of two sub-registers, to remember which is the higher value byte and the lower one, H of HL means “High” (byte) and so they’re all other registers of the same column in the image (A for AF, B for BC, D for DE) and L means “Low” (byte) and the same is valid for other sub-registers in the same column.
A: Accumulator. As this could be used as a general-purpose 8-bit register, it’s usually used as the accumulator of operations results.
F: Flags. This is a special 8-bit register that contains operations flags. Carry, Half-Carry, Zero, and Negative flags, represented by the four higher bits of the byte, are automatically set here based on operations.
The lower four bits are always set to “0” (or simply reset in zilog notation) and can’t be changed.
B, C, D, E, H, L: General purpose 8-bit registers.

Bit counting of Game Boy CPU registers — Bit counting of CPU Registers³

CPU Module and Structure

It’s time to implement a CPU in our code, let’s go!

use crate::GB::registers; // Import of registers submodule

pub struct CPU {
    pub registers: registers::Registers,
    pub ime: bool,  // Interrupt Master Enable - True if you want to enable and intercept interrupts
    pub opcode: u8,  // Running Instruction Opcode
    pub cycles: u64,  // Total Cycles Count
}

As you can see we import the registers module that I’ll show later to introduce also macro in Rust. We define our CPU structure with four public attributes:

registers: of type Registers structure. This represents the internal status and behaviors of CPU registers.
ime: Boolean type (bool in Rust). This is a special CPU flag called “Interrupt Master Enable” and we need this in the future to correctly manage Game Boy hardware interrupts.
opcode: u8 type. The Game Boy CPU works with 8-bit operation codes (abbr. opcodes). We store the actual in-use opcode in a byte to remember what operation hardware should do.
cycles: u64 type. This retains all the CPU cycle count already done. What we count are M-Cycles (Machine Cycles), a grouped, major unit of CPU internal T-Status cycles. A little external useful lecture on the reference⁴ of this article.

CPU Impl functions headers

Now we need to implement functions associated with the structure. Let’s review my intention by the function’s headers:

impl CPU {
    // Function to create a new instance
    pub fn new() -> Self {
        // To implement
    }

    // Function to retrieve next instructions addressed by PC
    pub fn fetch_next(&mut self) -> u8 {
        // To implement
    }
    
    // Function to decode an opcode
    pub fn decode(opcode: u8, cb_opcode: bool) -> instructions::Instruction {
        // To implement
    }
    
    // Function to execute the decoded opcode
    pub fn execute_next(&mut self) -> u64 {
        // To implement
    }
}

new: This will allow us to create a new instance in a consistent default way.
fetch_next: This function will use the PC register content as the address to the next program code in memory.
decode: This will convert an opcode byte into the corresponding instruction. Thanks to gbdev.io pandocs work, instructions are table-viewed, and they’ll be abstracted with their own structure and module.
execute_next: This will use the previous functionalities to pass through the next opcode execution.

Registers module

Before continuing to implement the CPU functions we must create and correctly manage its core registers, implemented in the registers module.

pub struct Registers {
    a: u8,
    b: u8,
    c: u8,
    d: u8,
    e: u8,
    f: u8,
    h: u8,
    l: u8,
    sp: u16,
    pc: u16,
}

As we previously argued, here implemented all registers contained in the CPU core, u8 for 8-bit registers and u16 for 16-bit ones. The 16-bit registers composed of two 8-bit subregisters are abstract and will be transparent to the user. Now we need to implement functions to read and write them, one by one or in couple for the composed registers.

impl Registers {
    fn get_a(&self) -> u8 {
        self.a
    }

    fn set_a(&self, val: u8) {
        self.a = val;
    }

    fn get_b(&self) -> u8 {
        self.b
    }

    fn set_b(&self, val: u8) {
        self.b = val;
    }

    fn get_c(&self) -> u8 {
        self.c
    }

    fn set_c(&self, val: u8) {
        self.c = c
    }

    // And so on with same code and different attributes...
}

Ok, this code is, honestly, a bit ugly: the same boring identical lines of code for seven 8-bit registers (I exclude the F one as it has its own behavior). I tell you we can reduce these repetitions with the Rust macros.

Rust Declarative Macros⁵ - Don’t Repeat Yourself

Macros are meta-codes that allow the Rust compiler to generate lines of code starting from a macro line that will be replaced with real code that follows the rules we wrote.

Surely the most famous built-in macro of Rust is the print one or the vec one, but let’s introduce my macro for Registers:

macro_rules! get_set {
    ($reg:ident, $get_name:ident, $set_name:ident, $size:ty) => {
        pub fn $get_name(&self) -> $size {
            self.$reg
        }

        pub fn $set_name(&mut self, val: $size) {
            self.$reg = val;
        }
    };
}

This type of macro is called “declarative macro”, the simplest and maybe the most used type of Rust macro. Starting from syntax, I’ll analyze declarative macro composition for you: a macro is defined through the macro_rules! keyword followed by the macro name, get_set for us, and the content between the brackets.

// The macro header: "macro_rules!" + "macro_name" + "{"
macro_rules! get_set {
    // ... macro definition content ...
}

This was the simple part, let’s review the token header definition ($reg:ident, $get_name:ident, $get_name:ident, $size:ty). This is used by the Rust compiler to match token names, starting with the dollar symbol ($), to use and replace in code generation. A token is composed by name after the dollar sign, and a type divided from the name by the colon symbol (:).
The tokens I defined for this macro are the $reg, $get_name, $set_name, tokens which are identity tokens, i.e. an identity name like a variable or function name, and a $size token, which expects to be a type, u8, u16, i8, i32, f32, &str, or any valid type you want to pass as a macro token.

$reg: This will be the register attribute name of the register structure we want to work with.
$get_name: The name of the function that we want to generate to read the wanted register.
$set_name: The name of the function we want to generate to edit the content of the wanted register.
$size: Size type of our register to specify if it is a u8 register or a u16 one.

To get a simpler vision for the first use of declarative macros, you can look at it as something similar to the variable declaration of a function header, but you need to remember that it is more a pattern matching similar to a regex pattern. Not 100% but it could be a good starting point to understand.
The token pattern is followed by an arrow (=>) and brackets which will contain the true macro definition. But what’s in our macro?

// macro declaration headers
        pub fn $get_name(&self) -> $size {
            self.$reg
        }

        pub fn $set_name(&mut self, val: $size) {
            self.$reg = val;
        }
// ...

The content we defined in the macro is the “example” and reference which Rust will use as a base to replace the macro calls with the specified code, substituting token reference with the token we passed in the macro call.
Let’s see how I call the written macro in my registers code:

struct Registers {
    // We already seen that, check some paragraph above
}

impl Registers {
    // ... other functions ...

    // ... and now we implement registers functions repeating codes with the previous macro
    get_set!(a, get_a, set_a, u8);
    get_set!(b, get_b, set_b, u8);
    get_set!(c, get_c, set_c, u8);
    get_set!(d, get_d, set_d, u8);
    get_set!(e, get_e, set_e, u8);
    get_set!(h, get_h, set_h, u8);
    get_set!(l, get_l, set_l, u8);
    get_set!(sp, get_sp, set_sp, u16);
    get_set!(pc, get_pc, set_pc, u16);
}

One single macro can have multiple implementations, divided by different token input headers.

In-depth Example

I use the first get_set macro call as an example to better understand what we expect from it. During the compilation command, Rust will find the following line:

get_set!(a, get_a, set_a, u8);

It will understand that we’re trying to call a macro and it finds a corresponding one by name in the same module.

macro_rules! get_set {
    ($reg:ident, $get_name:ident, $set_name:ident, $size:ty) => {
        pub fn $get_name(&self) -> $size {
            self.$reg
        }

        pub fn $set_name(&mut self, val: $size) {
            self.$reg = val;
        }
    };
}

It now checks if there is a declaration of (a, get_a, set_a, u8) with a corresponding header and it finds one: ($reg:ident, $get_name:ident, $set_name:ident, $size:ty). Then it replaces the general tokens we defined in the macro we wrote:

pub fn $get_name(&self) -> $size {
    self.$reg
}

pub fn $set_name(&mut self, val: $size) {
    self.$reg = val;
}

With:

pub fn get_a(&self) -> u8 {
    self.a
}

pub fn set_a(&mut self, val: u8) {
    self.a = val;
}

And thanks to meta-programming we can save many repeated lines of code 😁.

Macro for composed 16-bit register functions

We implemented a macro and functions for the single registers, now we need functions also on the composed registers (e.g. BC) managing the single 8-bit parts and composing them as required. I’ll name this macro with the not-so-fancy name get_set_dual. 🥔

macro_rules! get_set_dual {
    ($reg1:ident, $reg2:ident, $get_name:ident, $set_name:ident) => {
        pub fn $get_name(&self) -> u16 {
            (self.$reg1 as u16) << 8 | self.$reg2 as u16
        }

        pub fn $set_name(&mut self, val: u16) {
            self.$reg1 = (val >> 8) as u8;
            self.$reg2 = val as u8;
        }
    };
}

This macro allows us to manage two 8-bit registers as one 16-bit register, using bit shifting and OR operations:

The getter combines the two 8-bit registers with a left shift⁶ (<<) and a bitwise OR (|).
The setter splits the 16-bit value into its high and low parts using a right shift⁷ (>>) for the high byte and masking for the low byte.

GET of composed 16-bit register

To in deep explain the operations I used, I’ll use the BC register version of the code:

pub fn get_bc(&self) -> u16 {
    (self.b as u16) << 8 | self.c as u16
}

pub fn set_bc(&mut self, val: u16) {
    self.b = (val >> 8) as u8;
    self.c = val as u8;
}

We can start from the get function: it is a single line of code with many operations on a return statement. I think I’ll rewrite here a step-by-step version of it to better understand which operations are involved.

// Rewrite method step-by-step (remember "self" is a registers structure)
pub fn get_bc(&self) -> u16 {
    let mut b = self.b as u16;  // Convert to a 16-bit integer
    let c = self.c as u16;  // Convert to a 16-bit integer
    
    // We must shift the 16-bit version of 'b' by 8 bits as B is the higher part of the register
    // 0s will replace moved bit on their original positions
    b = b << 8;

    // Now we compose (and return) the low C and the upper B parts with OR bit operations to compose the  16-bit register value
    b | c
}

First of all, we cast B and C registers (u8) to 16 bit version (u16), as we need a 16-bit value to represent the composed register.
Now we have a B register variable that contains the correct bits but in the wrong position, because the casting from u8 to u16 make added higher 8 bits automatically set to 0.
We use the left shift (<<) operator to correct casted B register bits position. To get a visible example let’s say our starting original u8 b attribute is, in bits, 01110001 . When we cast it to u16 the new one will be 00000000_01110001 (note that underscore is just a divider to make it more readable and valid in Rust syntax⁸!). To position the lower byte on the higher byte position we’ll use the left shift by 8 bits: with b << 8 we obtain 01110001_00000000.
To complete our 16-bit register content we must fuse the B high register with the C low register, and the OR bit operator (|) is our right weapon!

I think you already know what an OR operation is but to KIS (Keep It Simple), given two numbers of the same bit length and return a value whose bits are equal to 1 if at least one of the bits in the same position of the operators is 1, otherwise 0.

Is important not to confuse the bit OR⁹ operator (|) with logical OR¹¹ operator (||). The logical one is an operation that returns you a boolean true/false statement based on boolean logic.

SET of composed 16-bit register

In the set function we must do the reverse thing we already done in the get function. Let’s take a look at the step-by-step version of the function:

pub fn set_bc(&mut self, val: u16) {
    let b = val >> 8; // We shift the higher byte of u16 on the lower position

    self.b = b as u8; // Casting to u8 truncate the u16 to the lower 8 bits
    self.c = val as u8; // The same here but we store the original lower 8 bits
}

We shift the value we want to store by 8 bits on the right (val >> 8). This puts the higher byte of u16 value in the lower byte position.
We cast the shifted value to a u8 type, this truncates higher bits and keeps the lower ones. Then we assign the value to the B register.
We cast the original value in a u8 type for the usual reasons we did this for the B register. Then we assign the new casted value to the C register.

Completing with F register functions

Our last macro allows us to write other functions as follows and you notice that we do not include any F register with our macros:

impl Registers {
    // Base Registers
    get_set!(a, get_a, set_a, u8);
    get_set!(b, get_b, set_b, u8);
    get_set!(c, get_c, set_c, u8);
    get_set!(d, get_d, set_d, u8);
    get_set!(e, get_e, set_e, u8);
    get_set!(h, get_h, set_h, u8);
    get_set!(l, get_l, set_l, u8);
    get_set!(sp, get_sp, set_sp, u16);
    get_set!(pc, get_pc, set_pc, u16);
    
    // Composed 16-bit Registers
    get_set_dual!(b, c, get_bc, set_bc);
    get_set_dual!(d, e, get_de, set_de);
    get_set_dual!(h, l, get_hl, set_hl);
}

This is because F has a special behavior I told you about the Game Boy CPU Flag register. Now I’ll show you my emulator code to implement the F register:

pub fn get_f(&self) -> u8 {
    self.f
}

pub fn set_f(&mut self, val: u8) {
    self.f = val & 0xF0
}

The get is as simple as the base one, but the setter function as you noticed, uses a bit AND (&) operation. Why, you asked? If you remember, the F register can change only the higher 4 bit of its value, while the lower ones are ALWAYS equal to 0. To be sure of that whenever I set a value in F, the AND operation with the hexadecimal mask value of F0 ( 1111_0000 in binary) make the four lower bits always reset to 0. This is because the bit AND operator sets the bit of result to 1 only if the bits in the same position of original operators are BOTH 1, and 0 otherwise.

As for the OR operator, we must not confuse the bit AND¹⁰ operator (&) with the logical AND¹¹ operator (&&).

The “new” function to create an instance

To complete the structure implementation I added a new function that returns a new default instance of Register structure:

impl Registers {
    pub fn new() -> Registers {
        Registers {
            a: 0,
            b: 0,
            c: 0,
            d: 0,
            e: 0,
            f: 0,
            h: 0,
            l: 0,
            sp: 0,
            pc: 0,
        }
    }
    // ... the others ...
}

What now?

Well, I may have written a looong article. It could be better to complete here it and relax your mind to absorb all the concepts I introduced to you! CPU basics and its core registers. The structure and functions of all these registers. Macros that helped us reduce repetitive lines of code. A big steak on the fire!
Next time we’ll complete the empty shell of our CPU, and add some bonus skills to simplify repeated functionalities of specific registers (psss, I’m talking about the default behavior of PC register).
Relax yourself, enjoy this lecture, and leave a clap or a comment if you want! We see in a week! 🐺

Game Boy CPU Schema - Game Boy Programming Manual V1.1 [p. 20] - https://ia903208.us.archive.org/9/items/GameBoyProgManVer1.1/GameBoyProgManVer1.1.pdf
Stack Data Structure - https://en.wikipedia.org/wiki/Stack_(abstract_data_type)
DMG Registers Bits Schema - Game Boy Programming Manual V1.1 [Ch. 4.1, p. 94] — https://ia903208.us.archive.org/9/items/GameBoyProgManVer1.1/GameBoyProgManVer1.1.pdf
Z80 Instructions Timing, T-States & M-States - https://floooh.github.io/2021/12/06/z80-instruction-timing.html#general-instruction-timing
Rust Declarative Macros - https://doc.rust-lang.org/book/ch19-06-macros.html#declarative-macros-with-macro_rules-for-general-metaprogramming
Left Shift Operator & Trait - https://doc.rust-lang.org/std/ops/trait.Shl.html
Right Shift Operator & Trait - https://doc.rust-lang.org/std/ops/trait.Shl.html
Number divider in Rust integers - https://doc.rust-lang.org/book/ch03-02-data-types.html#integer-types
Bitwise OR Operation - https://en.wikipedia.org/wiki/Bitwise_operation#OR
Bitwise AND Operation - https://en.wikipedia.org/wiki/Bitwise_operation#AND
Logical Truth Table Logical Operators - https://en.wikipedia.org/wiki/Bitwise_operation#Truth_table_for_all_binary_logical_operators