NES Emulator - Writing a Disassembler and Memory Viewer

Published in

Hard Mode

14 min readApr 25, 2023

Over the previous weekend (April 14–16 2023) the Handmade Network held a programming jam called Visibility Jam. The goal was to spend a weekend building something that helps bring visibility to things that are obscured or hidden in computing. My previous personal project was a NES emulator, and I thought it would be cool to use the Visibility Jam to program some features that provide visibility into the system. I was able to complete a few of the ideas I originally had:

Visualizing the pattern table memory on a cartridge (basically showing the sprites used in the game)
Writing a disassembler and displaying 6502 assembly you can step through as the game runs
Memory viewer

I had a lot of fun and I would like to share what I worked on.

If you’re the type that just wants to read the code, you can find it here.

Visualizing the Pattern Table Memory

I want to first briefly describe how graphics are represented in the NES. The graphics data for a particular game are stored in a chip on the cartridge called CHR memory. The image below is my copy of Double Dragon III; you can see the CHR memory is the large chip on the right.

The chip responsible for drawing these graphics to the screen — or more precisely for generating a composite video signal — is called the picture processing unit (PPU). The PPU fetches graphics data from the cartridge using an 8 KB region of the PPU address space from 0x0000–0x1FFF known as the pattern table. The pattern table is divided into two sections of 256 tiles each. These sections are often referred to as pattern table 0 and pattern table 1.

Each tile is 8x8 pixels and is represented by 16 bytes that are split into two bit-planes. For each pixel in the tile, one bit from each plane combine together to encode a palette index for that pixel. Which palette to use and the colors in each palette can be changed dynamically to change the colors of tiles.

That should give you a rough understanding of how graphics are represented, but if you are interested in learning more specific details I recommend checking out the fantastic NESdev Wiki, where I learned basically everything I know about the NES.

Okay, so to refocus on the task at hand, in order to visualize these pattern tables I needed to read through the pattern table region of the PPU address space (0x0000–0x1FFF), decode a palette index for each pixel, and get the corresponding color for the index in the currently selected palette. Each of these colors is then stored in a pixel buffer that is 128x128 pixels. To make this process more concrete, here is the code from my emulator:

// for each pattern table
for (int table=0; table<2; ++table)
// for each tile in the pattern table
for (int tile_row=0; tile_row<16; ++tile_row)
for (int tile_col=0; tile_col<16; ++tile_col) {
    int pixel_start = (tile_row*pitch+tile_col)*8;
    uint16_t tile_start = table*0x1000 + tile_row*0x100 + tile_col*0x10;
    // for each row in the tile
    for (int tile_y=0; tile_y<8; ++tile_y) {
        uint8_t plane0 = ppu_bus_read(tile_start+tile_y);
        uint8_t plane1 = ppu_bus_read(tile_start+tile_y+8);
        // for each col in the tile
        for (int tile_x=0; tile_x<8; ++tile_x) {
            int i = pixel_start + tile_y*pitch + (7-tile_x);
            int pal_index = (plane0 & 1) | ((plane1 & 1) << 1);
            uint32_t color = get_color_from_palette(selected_palette, pal_index);
            pattern_tables[table].pixels[i] = color;
            plane0 >>= 1; plane1 >>= 1;
        }
    }
}

Now for the fun part! We can pop in some game cartridges and see what the pattern tables look like!

(BTW: “popping in a game cartridge” is way more satisfying on a real NES: grabbing that chunky plastic cartridge, sliding it into the slot, pressing it in and feeling the teeth at the back grip the cartridge, and pressing the whole slot down vertically, locking it into place with a satisfying click!)

Here are some screenshots of pattern tables from some of my favorite games as a kid. Oh the nostalgia! The pattern table on the left is what the PPU reads from 0x0000–0x0FFF, and the right one is from 0x1000–0x1FFF.

Disassembler

My next goal with the debug window was to display the CPU state and the game’s code, and let the user step through the code as the game runs. To understand how this works requires a little bit more explanation about the CPU and game cartridges.

The CPU used in the NES was an 8-bit microprocessor designed by MOS Technologies called the 6502. This was the same CPU used in some of the influential 8-bit home computers like the Apple II and Commodore 64. The actual chip used in the NES was a custom 6502 manufactured by Ricoh that had both the 6502 CPU and the audio processing unit (APU) on the same chip. There is some curious lore surrounding this chip, how Ricoh cloned the 6502, and whether it was licensed by MOS Technologies or not. If you are interested in investigating further you can start here.

Similar to the PPU discussed in the previous section, the CPU communicates with other devices on the NES over a bus by simply reading from or writing to addresses. It has a 64 KB address space in which the game cartridge is accessible from 0x4020–0xFFFF. There is a special register inside the CPU called the program counter or instruction pointer that always holds the address of the next instruction for the CPU to execute. So to fetch an instruction from the cartridge the CPU looks at the address in the program counter, reads an instruction from that address, and advances the program counter.

Remember that the goal is to visualize this process in the debug window, so we need to display the instruction corresponding to the CPU’s program counter and ideally some instructions surrounding it for context. In order to display an instruction in human readable assembly form, we need to decode the raw bytes — aka machine code — stored in memory on the cartridge. This process is called disassembling.

For example if we are given the byte stream AD 10 30, we need to disassemble that into the instruction LDA $3010 (which means load register A with the value at memory address 0x3010). The 6502 has variable length instructions; they can be either one, two, or three bytes long. The length of the instruction depends on the addressing mode. In the example LDA $3010, the addressing mode is called absolute addressing and requires two additional bytes after the opcode to encode the memory address. To disassemble an arbitrary instruction, we start by reading the first byte of the instruction which is the opcode. The opcode encodes both the kind of operation (add, subtract, load, store, etc.) and the addressing mode. The strategy I used was to create a table of instructions indexed by opcode. You can lookup the instruction in the table to determine the addressing mode, which tells you how many more bytes to read for this instruction. After reading all the necessary instruction bytes, we perform a straightforward translation of the opcode to its assembly mnemonic and the addressing mode into assembly syntax.

As you can see, the procedure for disassembling an instruction is relatively simple. The bulk of the work involved in facilitating disassembly is just creating the instruction table. There are a number of 6502 datasheets that were published, from MOS and Rockwell for example, that contain instruction tables that can more or less be directly translated. My favorite resource for referencing the 6502 instruction set, however, is this wonderful website. In my case, I had already written the instruction table since I had to in order to emulate the 6502 CPU in the first place.

Okay so now that we understand how to disassemble instructions, your first thought might be to just disassemble all the code on the cartridge a single time when the emulator boots up, starting at 0x4020 and stopping at 0xFFFF. However, there is an important piece of the puzzle which we have not discussed yet that makes this approach inviable: mappers.

Mappers

The address space that the CPU uses to access the cartridge is fixed to the range 0x4020–0xFFFF. If the code on a cartridge is directly mapped into this address space then the maximum amount of code that particular game can contain is just under 48 KB. In fact there are many NES games that do just that. However, if a game wants to use more than 48 KB of code then it has to map different chunks of its code into and out of the CPU’s address space continually as the game runs. This remapping of addresses is what a mapper does. If we revisit the image of the Double Dragon III board, you can see the mapper chip in the bottom left.

Mappers can extend not only the code capacity of a cartridge, but also the graphics capacity since a mapper could also map different chunks of graphics data into and out of the PPU address space. This leads to the larger point that game developers could hypothetically manufacture mapper chips that do any arbitrary thing they want. This is part of what makes the job of a NES emulator developer difficult, you have to emulate the mapper chip for every game that you want your emulator to support. Fortunately, there are some common ones that a lot of games used and you can get a lot of mileage out of emulating just a handful of mappers.

Back to the problem of disassembling, the reason you cannot just disassemble all the code in a game once at startup is because any address can potentially be remapped to a different location by the mapper as the game is running. This suggests that some form of dynamic disassembly must be employed. As the game runs, we need to continually disassemble how ever many instructions we want to display. You can think of this as a moving window in the address space. I wanted to show the current instruction highlighted in the middle, a few of the previously executed instructions displayed above it, and a few of the following instructions displayed below it.

Disassembling previously executed instructions presents a problem: since the 6502 has variable length instructions, given the current address in the program counter you don’t know how far back to go to reach the start of a previous instruction. You might consider stepping back by one byte at a time and testing for opcodes, but it is entirely possible for a byte that represents a memory address to also be a valid opcode and there is no way to differentiate the two. My solution to this problem was to just cache a certain number of instruction addresses. The program counter could be cached before each instruction that the CPU executes, replacing the oldest address in cache. Here is the API for storing and retrieving cached instruction addresses in my emulator:

uint16_t cached_ins_addrs[MAX_CACHED_INS];
int cached_ins_index;

void cache_ins_addr(uint16_t addr) {
    cached_ins_addrs[cached_ins_index] = addr;
    cached_ins_index = (cached_ins_index+1) % MAX_CACHED_INS;
}

uint16_t get_cached_ins_addr_at(int num_prev_ins) {
    assert(num_prev_ins < MAX_CACHED_INS);
    int index = cached_ins_index - num_prev_ins;
    if (index < 0) index += MAX_CACHED_INS;
    return cached_ins_addrs[index];
}

cached_ins_addrs is a ring buffer of addresses. When you cache an address, a cursor is incremented and wraps around to zero once it reaches the end of the buffer. When you want to get the address of the instruction that was executed say five instructions ago, you call get_cached_ins_addr_at(5). This simple system enables disassembling n previously executed instructions, where n is the size of the ring buffer.

Disassembling future instructions is straightforward. Since an opcode tells you how many bytes an instruction contains, the address where the next instruction begins is unambiguous.

Now we have the full picture for disassembling. Say we want to display 14 instructions in the debug window: we use the cached instruction addresses to disassemble 6 previous instructions, then we disassemble the current instruction in the program counter, and finally we disassemble 7 instructions following that. If we also include a print out of the current state of all the CPU registers then we finally have our completed debug window!

Visualizing CPU state and stepping through disassembled instructions

You can see the current instruction is highlighted pink. As you step through instructions, the CPU executes each one and you can see the registers update accordingly.

The last piece that is interesting to talk about with respect to disassembling is the memory management strategy. Since we are continually disassembling different chunks of memory, we have to generate new assembly strings for each frame that the emulator renders. The naïve strategy in C would be to just malloc and free all of the strings we generate. Another more efficient but less flexible strategy could be to just have a hardcoded number of static buffers and write the assembly strings into these. I found that I could get the best of both worlds by using an arena allocator. A big chunk of memory is allocated at program startup that acts as a stack. In the main loop for each frame to be rendered, all the strings are pushed onto the stack when they are generated and then rendered to the screen. At the end of each frame the position in the memory stack is restored to its position before the frame, essentially popping all the memory from the stack. If this is your first time encountering an arena allocator and you want to learn more, I can point you to this video by Per Vognsen (where I learned) and this article by Ryan Fleury.

Memory Viewer

The final feature that I was able to implement during the jam was a memory viewer. I started by just displaying a small fixed range of the address space. This is very straightforward: we iterate over the address range, use bus_read(addr) to read each byte, and render them in hexadecimal format. bus_read(addr) just dispatches to whichever device occupies the address space that addr falls in. In the case of the cartridge, the mapper can perform any mapping logic that it needs to, but from an outside perspective we don’t have to think about any of that we just get a byte returned. Each display line starts with a memory address followed by sixteen bytes that represent the data in memory starting at that address. I also display the ASCII representation of bytes that land within the ASCII range. This is not particularly useful for a NES emulator, but it is interesting and taught me that some games store their name (like CONTRA and ZELDA) in a few bytes at around 0xFFEA.

The next step was to allow the full 64 KB memory space to be explored. This required two things: the ability to scroll around the memory space and “clipping” the address space that is read, formatted, and rendered to only what will be visible in the memory window. To start, I introduced a scroll_y variable that is incremented or decremented depending on the direction that the user scrolls the mouse wheel. The starting address of the memory range visible in the viewer is then calculated based on the scroll_y value. The ending address is calculated based on the start address and the number of display lines visible on screen, which in turn depends on both the font height and the window height. Some care must be taken to ensure that the start and end addresses are 16 byte aligned and also to prevent scrolling below 0 or above 0xFFFF. That can be difficult to mentally put together from a description in prose, so here is the code from my emulator that renders the memory contents:

void render_memory(void) {
    char buffer[MAX_LINE_LEN];
    int num_visible_lines = (memory_window.height / TEXT_LINE_HEIGHT) - 1;
    int line_num          = 0;
    uint16_t start        = 16 * (uint16_t)(memory_window.scroll_y / TEXT_LINE_HEIGHT);
    start                 = MIN(start, 0xFFF0 - 16*num_visible_lines);
    uint16_t end          = start + 16*num_visible_lines;

    for (uint16_t addr = start; addr >= start && addr <= end; addr += 16) {
        uint8_t bytes[16];
        char ascii[17];
        for (int i=0; i<16; ++i) {
            bytes[i] = bus_read(addr+i);
            ascii[i] = isprint(bytes[i]) ? bytes[i] : '.';
        }
        ascii[16] = 0;

        snprintf(buffer, MAX_LINE_LEN, 
            "%04X: %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X |%s|",
            addr, bytes[0], bytes[1], bytes[2], bytes[3],
            bytes[4], bytes[5], bytes[6], bytes[7],
            bytes[8], bytes[9], bytes[10], bytes[11],
            bytes[12], bytes[13], bytes[14], bytes[15],
            ascii);
            
        int pad = 15;
        render_text(
            &memory_window,                                // window
            buffer,                                        // text
            pad,                                           // x position
            line_num*TEXT_LINE_HEIGHT + TEXT_LINE_HEIGHT); // y position
        ++line_num;
    }
}

And here is what scrolling the memory viewer with the mouse wheel looks like in action:

memory viewer scrolling with mouse wheel

Goto Address Command Window

With a 64 KB address space, it could take a painfully long time to manually scroll to a specific address. So I added a little popup command window that allows the user to type in an address and jump directly to it in the memory viewer.

The algorithm is essentially:

If the user presses the g key, open the goto address command window
If the command window is open, record user input into a buffer only allowing valid hex digits
If the user presses enter, calculate the scroll offset needed to jump to the desired address and set scroll_y

The scroll offset calculation is pretty simple: we align the user supplied address down to 16 bytes to get the address at the start of the line, divide by 16 to get the line number, and multiply by the height of a text line to get the scroll offset. Now we can easily jump around to any location in the address space!

memory viewer goto address command window

Demonstration

With these three new features, we have gained a much better view into the system and can visualize what is going on in the NES. The pattern table viewer shows us how some games swap palettes to achieve blinking lights or swap tiles to achieve more interesting character animations. The disassembler lets us see exactly what instructions the CPU is executing and view the state of all the CPU registers. The memory viewer lets us explore the entire memory space, and combined with the debug window allows us to see how the CPU interacts with all the devices connected to the bus. Here is video demonstrating the use of all of the features implemented during the jam.

Closing Thoughts

I had a really fun weekend hacking on my emulator, and I think I was able to achieve some interesting results. Jams are really effective at showing you how much you can get done when you consciously set aside a chunk of time to intensely focus and work hard. It was also cool to see people posting updates, screenshots, and clips of their projects in the Handmade Network discord. Here are just a few of my favorite projects submitted that you should also check out: REDE, y no server, and PNG Chunk Explorer. See all the Visibility Jam projects here.

Thanks for reading!

Shout Outs

The visual layout for the debug window I copied from javidx9’s emulator. Check out his youtube channel, he is great.

Reading through the source code of bisqwit’s emulator gave me some ideas for my own.

I am a huge fan of Sean Barrett’s single-header libraries, and he has one for true type font rasterizing, stb_truetype. I had never used it before the jam but it got me up and running with text rendering in no time. Thanks Sean.

I highly recommend Rodrigo Copetti’s article as an overview of the NES architecture.

The NESdev Wiki is absolutely indispensable for learning about the NES and how to emulate it. I am so grateful to the amazing community that put in an enormous amount of work to reverse engineer and document the system. In particular, a huge thank you to Blargg and Disch (RIP). I found your resources and forum posts monumentally helpful and I would not have been able to write my emulator without them.