Ghidra Decompiler wireformat

Remco Verhoef

The NSA has made their Software Reverse Engineering suite of tools available. You can download the software here. The suite contains several tools, including a decompiler. The decompiler works quite well and will be very interesting if available within other reverse engineering tools like Binary Ninja, Radare2, x64dbg, hopper etc. The decompiler analyses the disassembled code and show pseudo c code with the flow.

In folder ./Ghidra/Features/Decompiler/ you’ll find for each operating system (osx64, linux64 and windows64) the executable decompile(.exe). This executable is being started when a binary has loaded, and the main Ghidra process communicates with decompiler using stdin and stdout.

Java source code for the Ghidra decompiler part is available at location ./Ghidra/Features/Decompiler/lib/Decompiler-src.zip. This contains all code that is being used within Ghidra to communicate with the decompiler. The interesting source file is /ghidra/app/decompiler/DecompileProcess.java in the zip file.

To ease debugging I’ve created a debugging file, containing the system calls (and communication) between both processes. You’ll find the log here https://gist.github.com/nl5887/9832122fe8df06f1187c6c766a7d840a#file-debug-24010-log.

Program Registration
Ghidra will start with registering the application. It will send command registerProgram, including the processor specification (pspecxml), the compiler specification (cspecxml), translator specification (tspecxml) and the coretypes.

Both psecxml and cspecxml specifications are available for many processors and compilers at path Ghidra/Processors/{processor}/data/languages/{processor}.pspec and Ghidra/Processors/{processor}/data/languages/{processor}.cspec

Next it will call readResponse which will interact with the decompiler. The decompiler will ask the main process for information about naming of the operations (getUserOpName), which needs to be answered with the friendly operation name. Next it will ask for the available registers (getRegister), and options (setOptions) to configure commenting and other options. Now it will ask for decompile information (decompileAt), which includes the address where to start decompilation.

Retrieving mapped symbols (getMappedSymbolsXML) is next, asking for information about symbols.

The hex code to be decompiled is being retrieved by command getPacked, for specific address. Information getTrackedRegisters will retrieve the following xml

<tracked_pointset space=”ram” offset=”0x413c8c”>
<set space=”register” offset=”0x20a” size=”1” val=”0x0”/>
</tracked_pointset>

The decompiled C code will be returned as a xml containing functions, comments and the code itself.

When ready a flushNative command will be send, indicating that the decompiler is ready.

Program Deregistration
When the program is finished, it will send command deregisterProgram to the decompiler, shutting it down.

This post is a work in progress, and will be updated in time. At the gist you’ll also find a python command, which is able to communicate with the decompiler binary.

Have fun,

Remco

Remco Verhoef

Written by

Founder @ DutchSec // Linthub.io // Transfer.sh // SlackArchive // Dutchcoders // OSC(P|E) // C|EH // GIAC // Security // DevOps // Pythonista // Gopher.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade