A debugger from scratch — part 2

Mapping between source code and machine code

In part 1 I looked at using the ptrace system call to set a breakpoint in a target executable at the address of the machine code instruction where we want it to stop. If you didn’t already read that, you might want to take a quick look.

To get the debugger to stop at a particular line in the source code, we need a way to identify the corresponding machine code instruction.

But this begs an important question: how do we know where to set the breakpoint?

As a human, I’d like to be able to set a breakpoint at a particular line in the source code, but ptrace gives us the mechanism for setting the breakpoint at a machine code instruction address in memory. We need to map between source code and the corresponding instructions in memory. We can do this using Go’s debug/elf and debug/gosym packages.

Elf? As in magical creatures?

Before we dive into the Go code let’s find out a little bit about ELF, the executable file format. There is an executable called readelf which we can use to inspect Linux executable files — various options on the command let us look at different elements of the ELF file.

First let’s look at the ELF headers of my hello executable.

$ readelf -h hello
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2’s complement, little endian
Version: 1 (current)
OS/ABI: UNIX — System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86–64
Version: 0x1
Entry point address: 0x456420
Start of program headers: 64 (bytes into file)
Start of section headers: 456 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 7
Size of section headers: 64 (bytes)
Number of section headers: 23
Section header string table index: 3

From the parts I highlighted in bold we can see that

  • hello is an executable file
  • the first instruction to execute will be at address 0x456420 when this file is loaded into memory

We can also see that there are some “program headers” and “section headers”. Let’s use readelf to look at the section headers (most omitted for clarity):

readelf -S hello/hello
There are 23 section headers, starting at offset 0x1c8:
Section Headers:
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0

[ 6] .gosymtab PROGBITS 00000000004deae8 000deae8 0000000000000000 0000000000000000 A 0 0 1
[ 7] .gopclntab PROGBITS 00000000004deb00 000deb00 000000000005324b 0000000000000000 A 0 0 32

We’re particularly interested in the two headers whose names start with .go — they are the symbol table (section 6) and the line table (section 7).

Elves but not dwarves, today

As you can see, section 6, with the name .gosymtab, has a size of zero (also shown in bold). Back in very early versions of Go .gosymtab was used to hold symbol table information, but this was dropped in Go 1.3 (presumably in favour of Dwarf format information, though I haven’t verified that this is what happened).

Dwarves vs Dark Elves by RedCraig

Section 7, called .gopclntab, is more interesting. This contains the mapping between Program Counter address and source code lines.

Let’s read this information from the hello executable file into the debugger we’re writing in Go. The aforementioned debug/elf package makes it easy to read in the information from that .gopclntab section header. For brevity I have omitted error handling.

 exe, _ := elf.Open(“hello”)
lineTableData, _ := exe.Section(“.gopclntab”).Data()
addr := exe.Section(“.text”).Addr
lineTable := gosym.NewLineTable(lineTableData, addr)
symTable := gosym.NewTable([]byte{}, lineTable)

The NewTable() function takes a symbol table for its first parameter, but we can just pass an empty list of bytes since we know the executable’s symbol table header has a size of 0.

There are some helpful functions on the symTable to map between a line in in a source code file, and the corresponding machine code instruction.

Look up information about a particular named function

For example, we can look up the main function in the main package of the executable we’re debugging:

 fn = symTable.LookupFunc(“main.main”)
fmt.Printf(“function %s starts at %X\n”, fn.Name, fn.Entry)

We get a Func structure with information about that function, including the address of the first machine code instruction in its code. When the function is called, the Program Counter gets set to that address so that execution will continue from there.

Find the source code corresponding to a machine code address

If pc is the machine code address:

 file, line, fn = symTable.PCToLine(pc) address 
fmt.Printf(“function %s at line %d in file %s\n”, fn.Name, line, file)

Find the machine code address corresponding to a line in source code

 pc, fn, _ = symTable.LineToPC(file, line)
fmt.Printf(“function %s at line %d in file %s\n”, fn.Name, line, file)

So if we want to set a breakpoint at a particular line in source code, we can look up the corresponding machine code instruction address with LineToPC(). This gives the address where we can write the breakpoint instruction as described in Part 1.

Once we have stopped at a break point, the next step is to display the stack at that point. We’ll look at that in part 3, or you can find out straight away in the video below or in the accompanying git repo.

This series of posts is based on a talk that I first did at dotGo Paris, and I recently did an extended version at GopherCon UK — here’s the video (but my name is really Liz, not Luiz!).