Interfacing C With Rust To Build a Debugger

Pm
Rustaceans
Published in
7 min readJan 1, 2024

--

Introduction

I decided to write a debugger for the sim65 6502 emulator that's part of the cc65 tool suite (cc65 — a freeware C compiler for 6502 based systems).

The cc65 toolset is written in c, but I was not going to even attempt to code a debugger in pure c (I might have been tempted to use c++ or even c# in my pre-rust days), so I wrote it in rust. This was my first exposure to interfacing c code from rust. I was pleasantly surprised how easy it was.

This article goes into details of what calls I needed to interface with and how it was done.

Sim65 structure

This is a diagram of the structure of the sim65 code. The core exposes just two API calls to main

  • ExecuteInsn, executes a single instruction
  • Reset, resets the 6502

The 6502 core invokes services provided by two files:

  • memory.c provides the emulated 64k RAM.
  • paravirt.c provides system calls, mainly IO. See later.

There is also one global data item exposed by 6502.c , CPU. An enum which specifies if the emulated cpu is a 6502 or a 65c02.

Main reads an executable image, loads it into the emulated RAM, calls reset and then enters an endless loop calling ExecuteInsn. Its pretty simple.

I decided that I would only use 6502.c, and that I would make no changes to it if at all possible.

The paravirt module provides system services to the 6502 emulator. All calls executed by the 6502 code are passed to the PV system which looks at the target address and maps:

  • 0xfff4 -> open file
  • 0xfff5 -> close file
  • 0xfff6 -> read file
  • 0xfff7 -> write file
  • 0xfff8 -> get argc and argv
  • 0xfff9 -> exit

paravirt.c reads the arguments off the 6502 call stack, calls the relevant host verb and returns the result back to the 6502.

All these calls need to be called or provided by the rust code

FFI overview

Interfacing to c code is provided by the Foreign Function Interface (FFI), detailed here FFI — The Rustonomicon (rust-lang.org)

Steps

  • configure the build to compile the c code (assuming you don't already have a compiled library)
  • add declarations for outbound calls from rust to c (the main to 6502 calls in my case)
  • add declarations for inbound calls from c to rust (the 6502 to memory and paravirt modules in my case)

Setup build

You need to have a c compiler installed. I was developing on both Windows (when at home) and Mac (on the road). I have Visual Studio on Windows and clang on Mac.

Use the ‘cc’ crate to allow the build to find and interface with the c compiler. cc — Rust (docs.rs). It needed no configuration, it found the compilers on my system and it just worked!

cargo add --build cc

Now add a build.rs file at the root of the project (same dir as the cargo.toml file)

fn main() {
println!("cargo:rerun-if-changed=sim65/6502.c");
cc::Build::new()
.file("sim65/6502.c")
.define("DB65", "1")
.compile("sim65");
}

The define of DB65 is needed because I did end up have to make a couple of minor tweaks to the source code.

Now when the rust project is built the c code gets compiled too (in needed). The object code gets placed in the correct place (I don't know where that is, cc and cargo took care of it) so that the linker found it.

Declare the rust to c calls

The c calls are declared in 6502.h

void Reset (void);
/* Generate a CPU RESET */

unsigned ExecuteInsn (void);
/* Execute one CPU instruction. Return the number of clock cycles for the
** executed instruction.
*/

I also added one call because I needed access to the register block

#ifdef DB65
CPURegs *ReadRegisters()
{
return &Regs;
}
#endif

The register block looks like this (6502.h)

struct CPURegs {
unsigned AC; /* Accumulator */
unsigned XR; /* X register */
unsigned YR; /* Y register */
unsigned ZR; /* Z register */
unsigned SR; /* Status register */
unsigned SP; /* Stackpointer */
unsigned PC; /* Program counter */
};

So now add some rust declarations

The API calls are pretty simple.

extern "C" {
pub fn ExecuteInsn() -> u32;
}
extern "C" {
pub fn Reset();
}
extern "C" {
pub fn ReadRegisters() -> *mut CPURegs;
}

Also need to declare the CPURegs struct

#[repr(C)]
pub struct CPURegs {
pub ac: u32, /* Accumulator */
pub xr: u32, /* X register */
pub yr: u32, /* Y register */
pub zr: u32, /* Z register */
pub sr: u32, /* Status register */
pub sp: u32, /* Stackpointer */
pub pc: u32, /* Program counter */
}

Note the ‘#[repr(c)]’ that tells rust to lay this out according to c structure alignment rules, not rusts.

We can now call the various functions

    pub fn execute_insn() -> u32 {
unsafe { ExecuteInsn() }
}

Note the ‘unsafe’. Rust decides that all calls to c code are unsafe and so you have to explicitly state that you are calling unsafe code.

The register location is read once at startup, we get a pointer to the CPURegs block inside 6502. There is a field inside my Cpu struct (see later)

    regs: *mut CPURegs,        // a pointer to the register block

that gets loaded like this

    pub fn reset() {
unsafe {
THECPU.regs = ReadRegisters();
...
Reset();

There remains just the CPU global variable. The c code

/* Current CPU */
CPUType CPU;

And the rust

extern "C" {
static mut CPU: u16;
}

The callbacks

The code we have so far will not link though because the callbacks to memory and paravirt are missing. The linker can see that 6502.c is calling things but it can't find them. These are the memory calls we have to supply, declared in the c code (memory.h)

unsigned char MemReadByte(unsigned Addr);
unsigned MemReadWord(unsigned Addr);
unsigned MemReadZPWord(unsigned char Addr);
void MemWriteWord(unsigned Addr, unsigned Val);

The rust declarations (plus the one Paravirt callback)

#[no_mangle]
extern "C" fn MemWriteByte(addr: u32, val: u8) {
// code goes here
}
#[no_mangle]
extern "C" fn MemReadWord(addr: u32) -> u32 {
//
}
#[no_mangle]
extern "C" fn MemReadByte(addr: u32) -> u8 {
//
}
#[no_mangle]
extern "C" fn MemReadZPWord(mut addr: u8) -> u16 {
//
}
#[no_mangle]
extern "C" fn ParaVirtHooks(_regs: *mut CPURegs) {
// code goes here
}

Note the ‘no_mangle’ declaration. This makes rust expose those calls with the declared names rather than with rusts mangled names (just like ‘extern “c”’ in c++)

Managing state

In the c code the RAM and all other state in managed in global variables. So the RAM io calls, for example, just say ‘read a byte from this address’. When these calls reach the rust world I need to direct them to an instance of a Cpu object that maintains the state. This is the Cpu object

pub struct Cpu {
ram: [u8; 65536], // the actual 6502 ram
shadow: [u8; 65536], // a shadow of the ram, used for memcheck
regs: *mut CPURegs, // a pointer to the register block
exit: bool, // set to true when the 6502 wants to exit
exit_code: u8, // the exit code
sp65_addr: u8, // the location of the cc65 'stack' pointer
memcheck: Option<u16>, // the address of the last memcheck failure
arg_array: Vec<String>, // the command line arguments
memhits: [(bool, u16); 6], // used for data watches
memhitcount: u8, // entry count in hit array for this instruction
pub paracall: bool, // we just did a pv call
}

There's a lot of stuff there, but you can see the RAM and the pointer to the registers inside 6502.c. I have to create a global instance

static mut THECPU: Cpu = Cpu {
ram: [0; 65536],
shadow: [0; 65536],
regs: std::ptr::null_mut(),
sp65_addr: 0,
exit: false,
exit_code: 0,
memcheck: None,
arg_array: Vec::new(),
memhits: [(false, 0); 6],
memhitcount: 0,
paracall: false,
};

Note that it's called THECPU, rust is very insistent that globals be given all caps names.

The inbound calls look like this, taking MemWriteByte as an example

#[no_mangle]
extern "C" fn MemWriteByte(addr: u32, val: u8) {
unsafe {
THECPU.inner_write_byte(addr as u16, val);
THECPU.shadow[addr as usize] = 1;
THECPU.memhits[THECPU.memhitcount as usize] = (true, addr as u16);
THECPU.memhitcount += 1;
}
}

Note that this code is declared unsafe too, that’s because all operations on read/write globals are considered unsafe by rust.

Putting it all together

Here is the associated method to reset the cpu state (both db65 internal state and the 6502.c state)

impl Cpu{
....
pub fn reset() {
unsafe {
THECPU.regs = ReadRegisters();
THECPU.exit = false;
THECPU.memhitcount = 0;
THECPU.arg_array.clear();
Reset();
THECPU.memcheck = None;
THECPU.paracall = false;
}

But the messy unsafe and ALLCAPS are confined to one file (cpu.rs) the rest of the code calls these associated methods.

Here is an example is the core of the debugger starting to run code

    pub fn run(&mut self, cmd_args: Vec<&String>) -> Result<StopReason> {
Cpu::write_word(0xFFFC, self.loader_start);
Cpu::reset();
Cpu::push_arg(&self.load_name);
for arg in &cmd_args {
Cpu::push_arg(arg)
}
self.stack_frames.clear();
self.execute(0) // 0 = forever
}

Every Cpu:: call is either modifying the rust object or calling the 6502 engine itself (reset for example)

Conclusion

It turned out to be much less messy than I expected. I am always amazed by the quality of the rust toolchain and the ecosystem of crates around it. Things just work.

The full code for this project is here: https://github.com/pm100/db65

Rustaceans 🚀

Thank you for being a part of the Rustaceans community! Before you go:

  • Show your appreciation with a clap and follow the publication
  • Discover how you can contribute your own insights to Rustaceans.
  • Connect with us: X | Weekly Rust Newsletter

--

--