G-3PO: A Protocol Droid for Ghidra
(A Script that Solicits GPT-3 for Comments on Decompiled Code)
TL;DR
In this post, I introduce a new Ghidra script that elicits high-level explanatory comments for decompiled function code from the GPT-3 large language model. This script is called G-3PO. In the first few sections of the post, I discuss the motivation and rationale for building such a tool, in the context of existing automated tooling for software reverse engineering. I look at what many of our tools — disassemblers, decompilers, and so on — have in common, insofar as they can be thought of as automatic paraphrase or translation tools. I spend a bit of time looking at how well (or poorly) GPT-3 handles these various tasks, and then sketch out the design of this new tool.
If you want to just skip the discussion and get yourself set up with the tool, feel free to scroll down to the last section, and then work backwards from there if you like.
The Github repository for G-3PO can be found HERE.
On the Use of Automation in Reverse Engineering
At the present state of things, the domain of reverse engineering seems like a fertile site for applying machine learning techniques. ML tends to excel, after all, at problems where getting the gist of things counts, where the emphasis is on picking out patterns that might otherwise go unnoticed, and where error is either tolerable or can be corrected by other means. This kind of loose and conjectural pattern recognition is where reverse engineering begins. We start by trying to get a feel for a system, a sense of how it hangs together, and then try to tunnel down. Impressions can be deceptive, of course, but this is a field where they’re easily tested, and where suitable abstractions are both sought and mistrusted.
The goal, after all, is to understand (some part of) a system better than its developers do, to piece together its specification and where the specification breaks down.
At many stages along the way, you could say that what the reverse engineer is doing is searching for ways to paraphrase what they’re looking at, or translate it from one language into another.
We might begin, for example, with an opaque binary “blob” (to use a semi-technical term for unanalyzed data) that we dumped off a router’s NAND storage. The first step might be to tease out its file format, and through a process of educated guesses and experiments, find a way to parse it. Maybe it turns out to contain a squashfs
file system, containing the router’s firmware. We have various tools, like Binwalk, to help with this stage of things, which we know can’t be trusted entirely but which might provide useful hints, or even get us to the next stage.
Suppose we then unpack the firmware, mount it as a filesystem, and then explore the contents. Maybe we find an interesting-looking application binary, called something like telnetd_startup
. Instead of reading it as an opaque blob of bits, we look for a way to make sense of it, usually beginning by parsing its file structure (let’s say it’s an ELF) and disassembling it — translating the binary file into a sequence, or better, a directed graph of assembly instructions. For this step we might lean on tools like objdump, rizin, IDA Pro (if we have an expense account), or, my personal favourite, Ghidra. There’s room for error here as well, and sometimes even the best tools we have will get off on the wrong foot and parse data as code, or misjudge the offset of a series of instructions and produce a garbled listing, but you get to recognize the kinds of errors that these sorts of tools are prone to, especially when dealing with unknown file formats. You learn various heuristics and rules of thumb to minimize and correct those errors. But tools that can automate the translation of a binary blob into readable assembly are nevertheless essential — to the extent that if you were faced with a binary that used an unknown instruction set, your first priority as a reverse engineer may very well be to figure out how to write at least a flawed and incomplete disassembler for it.
The disassembly listing of a binary gives us a fine grained picture of its application logic, and sometimes that’s the furthest that automated tools can take us. But it’s still a far cry from the code that its developer may have been working with — very few programs are written in assembly these days, and its easy to get lost in the weeds without a higher-level vantage point. This might be where the reverser begins the patient manual work of discovering interesting components of the binary — components where its handling user input, for example — by stepping through the binary with a debugger like GDB (perhaps with the help of an emulator, like QEMU), and then annotating the disassembly listing with comments. In doing so the reverser tries to produce a high-level paraphrase of the program.
Nowadays, however, we often have access to another set of tools called decompilers, which can at least approximately translate the dissassembly listing into something that looks like source code, typically something like C (but extended with a few pseudo types, like Ghidra’s undefined
and undefined*
to indicate missing information). (Other tools, static analysis frameworks like BAP or angr (or, internally, Ghidra or Binary Ninja), for example, might be used to “lift” or translate the binary to an intermediate representation more amenable to further automated analysis, but we’ll leave those aside for now.) Decompilation is a heuristically-driven and inexact art, to a significantly greater extent than disassembly. When source code (in C, for example) is compiled down to x86 or ARM machine code, there’s an irreversible loss of information, and moving back in the other direction involves a bit of guess work, guided by contextual clues and constraints. When reverse engineers work with decompilers, we take it for granted that the decompiler is probably getting at least a few things wrong. But I doubt anyone would say that they’re unhelpful. We can, and often must, go back to the disassembly listing whenever needed after all. And when something seems fishy there, we can go back to the binary’s file format, and see if something’s been parsed incorrectly.
In my day to day work, this is usually where automated analysis stops and where manual annotation and paraphrase begins. I slowly read through the decompiler’s output and try to figure out, in ordinary language, what the code is “supposed” to be doing, and what it’s actually doing. It’s a long process of conjecture and refutation, often involving the use of debuggers, emulators, and tracers to test interpretations of the code. I might probe the running or emulated binary with various inputs and observe the effects. I might even try to do this in a brute force way, at scale, “fuzzing” the binary and looking for anomalous behaviour. But a considerable amount of time is spent just adding comments to the binary in Ghidra, correcting misleading type information and coming up with informative names for the functions and variables in play (especially if the binary’s been stripped and symbols are missing). Let’s call this the process of annotation.
We might notice that many of the automated stages in the reverse engineer’s job — parsing and unpacking the firmware blob, disassembling binary executables, and then decompiling them — can at least loosely be described as processes of translation or paraphrase. And the same can be said for annotation.
This brings us back to machine learning.
Using Large Language Models as Paraphrasing Engines, in the Context of Reverse Engineering
If there’s one thing that large language models, like OpenAI’s GPT-3, have shown themselves to be especially good at, it’s paraphrase — whether it’s a matter of translating between one language and another, summarising an existing knowledge base, or rewriting a text in the style of a particular author. Once you notice this, as I did last week while flitting back and forth between a project I was working on in Ghidra and a browser tab opened to ChatGPT, it might seem natural to see how an LLM handles the kinds of “paraphrasing” involved in a typical software reverse engineering workflow.
The example I’ll be working with here, unless otherwise noted, is a function carved from a firmware binary I dumped from a Canon ImageClass MF743Cdw printer.
GPT-3 Makes a Poor Disassembler
Let’s begin with disassembly:
Disassembly seems to fall squarely outside of ChatGPT’s scope, which isn’t surprising. It was trained on “natural language” in the broad sense, after all, and not on binary dumps.
The GPT-3 text-davinci-003
model does no better:
This, again, would be great, if it weren’t entirely wrong. Here’s what capstone
(correctly) returns for the same input:
0x44b2d4b0: cmp r2, #3
0x44b2d4b4: bls #0x44b2d564
0x44b2d4b8: ands ip, r0, #3
0x44b2d4bc: beq #0x44b2d4e4
0x44b2d4c0: ldrb r3, [r1], #1
0x44b2d4c4: cmp ip, #2
0x44b2d4c8: add r2, r2, ip
0x44b2d4cc: ldrbls ip, [r1], #1
0x44b2d4d0: strb r3, [r0], #1
0x44b2d4d4: ldrblo r3, [r1], #1
0x44b2d4d8: strbls ip, [r0], #1
0x44b2d4dc: sub r2, r2, #4
0x44b2d4e0: strblo r3, [r0], #1
0x44b2d4e4: ands r3, r1, #3
0x44b2d4e8: beq #0x44b36318
0x44b2d4ec: subs r2, r2, #4
0x44b2d4f0: blo #0x44b2d564
0x44b2d4f4: ldr ip, [r1, -r3]!
0x44b2d4f8: cmp r3, #2
0x44b2d4fc: beq #0x44b2d524
0x44b2d500: bhi #0x44b2d544
0x44b2d504: lsr r3, ip, #8
0x44b2d508: ldr ip, [r1, #4]!
0x44b2d50c: subs r2, r2, #4
0x44b2d510: orr r3, r3, ip, lsl #24
0x44b2d514: str r3, [r0], #4
0x44b2d518: bhs #0x44b2d504
0x44b2d51c: add r1, r1, #1
0x44b2d520: lsr r3, ip, #0x10
0x44b2d524: ldr ip, [r1, #4]!
0x44b2d528: subs r2, r2, #4
0x44b2d52c: orr r3, r3, ip, lsl #16
0x44b2d530: str r3, [r0], #4
0x44b2d534: bhs #0x44b2d520
0x44b2d538: add r1, r1, #2
0x44b2d53c: lsr r3, ip, #0x18
0x44b2d540: ldr ip, [r1, #4]!
0x44b2d544: subs r2, r2, #4
0x44b2d548: orr r3, r3, ip, lsl #8
0x44b2d54c: str r3, [r0], #4
0x44b2d550: bhs #0x44b2d53c
0x44b2d554: add r1, r1, #3
0x44b2d558: lsls r2, r2, #0x1f
0x44b2d55c: ldrbhs r3, [r1], #1
0x44b2d560: ldrbhs ip, [r1], #1
0x44b2d564: ldrbmi r2, [r1], #1
0x44b2d568: strbhs r3, [r0], #1
0x44b2d56c: strbhs ip, [r0], #1
0x44b2d570: strbmi r2, [r0], #1
0x44b2d574: bx lr
0x44b2d578: subs r2, r2, #0x20
0x44b2d57c: blo #0x44b2d5b0
0x44b2d580: cmp r2, #0x80
0x44b2d584: blo #0x44b2d5a0
0x44b2d588: pld [r1, #0x80]
0x44b2d58c: ldm r1!, {r3, r4, r5, r6, r7, r8, ip, lr}
0x44b2d590: sub r2, r2, #0x20
0x44b2d594: stm r0!, {r3, r4, r5, r6, r7, r8, ip, lr}
0x44b2d598: ldm r1!, {r3, r4, r5, r6, r7, r8, ip, lr}
0x44b2d59c: subs r2, r2, #0x20
0x44b2d5a0: stm r0!, {r3, r4, r5, r6, r7, r8, ip, lr}
0x44b2d5a4: bhs #0x44b2d598
0x44b2d5a8: lsls ip, r2, #0x1c
0x44b2d5ac: ldmhs r1!, {r3, r4, ip, lr}
0x44b2d5b0: stmhs r0!, {r3, r4, ip, lr}
0x44b2d5b4: ldmmi r1!, {r3, r4}
0x44b2d5b8: stmmi r0!, {r3, r4}
0x44b2d5bc: lsls ip, r2, #0x1e
0x44b2d5c0: ldrhs r3, [r1], #4
0x44b2d5c4: strhs r3, [r0], #4
0x44b2d5c8: bxeq lr
0x44b2d5cc: lsls r2, r2, #0x1f
0x44b2d5d0: ldrhhs r3, [r1], #2
0x44b2d5d4: ldrbmi r2, [r1], #1
0x44b2d5d8: strhhs r3, [r0], #2
0x44b2d5dc: strbmi r2, [r0], #1
0x44b2d5e0: bx lr
GPT-3 Makes a Pretty Good Decompiler
Things look a lot brighter when we turn to decompilation.
If I first disassemble the binary using Capstone and then ask the LLM to decompile the result, I start to get surprisingly accurate and informative answers.
>>> prompt_openai_model(f"Here is some ARM assembly code:\n\n```\n{s}\n```\n\nIt can be decompiled into C code as follows:\n\n```\n",
model="text-davinci-003")
This gives us the following:
if (r2 == 3) {
if ((r0 & 3) == 0) {
while (r2 > 0) {
r3 = *r1++;
*r0++ = r3;
r2--;
}
} else {
while (r2 > 0) {
if ((r0 & 3) == 2) {
r3 = *r1++;
*r0++ = r3;
r2--;
} else {
ip = *r1++;
r2 -= 4;
if (r3 == 2) {
*r0++ = (ip << 24) | (*r1++);
} else if (r3 < 2) {
*r0++ = (ip << 16) | (*r1++);
r1++;
} else {
*r0++ = (ip << 8) | (*r1++);
r1 += 3;
}
}
}
}
} else {
while (r2 > 0) {
if (r2 >= 32) {
if (r2 >= 128) {
for (int i = 0; i < 4; i++) {
*r0++ = *r1++;
*r0++ = *r1++;
*r0++ = *r1++;
*r0++ = *r1++;
r2 -= 32;
}
} else {
*r0++ = *r1++;
*r0++ = *r1++;
*r0++ = *r1++;
*r0++ = *r1++;
r2 -= 32;
}
} else {
if (r2 >= 16) {
*r0++ = *r1++;
*r0++ = *r1++;
*r0++ = *r1++;
*r0++ = *r1++;
r2 -= 16;
} else {
if (r2 >= 8) {
*r0++ = *r1++;
*r0++ = *r1++;
r2 -= 8;
} else {
if (r2 >= 4) {
*r0++ = *r1++;
r2 -= 4;
} else {
if (r2 >= 2) {
*r0++ = *r1++;
r2 -= 2;
} else {
*r0++ = *r1++;
r2--;
}
}
}
}
}
}
}
Compare this with what Ghidra gives us:
undefined8 FUN_44b2d4b0(uint *param_1,uint *param_2,uint param_3,uint param_4)
{
uint *puVar1;
uint *puVar2;
byte *pbVar3;
uint *puVar4;
byte bVar5;
undefined2 uVar6;
uint uVar7;
uint uVar8;
uint uVar9;
uint uVar10;
uint uVar11;
uint uVar12;
byte bVar13;
uint in_r12;
uint uVar14;
uint uVar15;
uint uVar16;
bool bVar17;
bool bVar18;
if (3 < param_3) {
uVar14 = param_1 & 3;
in_r12 = uVar14;
if (uVar14 != 0) {
bVar5 = *param_2;
puVar2 = param_2 + 1;
if (uVar14 < 3) {
puVar2 = param_2 + 2;
in_r12 = *(param_2 + 1);
}
*param_1 = bVar5;
param_2 = puVar2;
if (uVar14 < 2) {
param_2 = puVar2 + 1;
bVar5 = *puVar2;
}
puVar2 = param_1 + 1;
if (uVar14 < 3) {
puVar2 = param_1 + 2;
*(param_1 + 1) = in_r12;
}
param_3 = (param_3 + uVar14) - 4;
param_1 = puVar2;
if (uVar14 < 2) {
param_1 = puVar2 + 1;
*puVar2 = bVar5;
}
}
param_4 = param_2 & 3;
if (param_4 == 0) {
uVar14 = param_3 - 0x20;
if (0x1f < param_3) {
for (; 0x7f < uVar14; uVar14 = uVar14 - 0x20) {
HintPreloadData(param_2 + 0x20);
uVar7 = *param_2;
uVar8 = param_2[1];
uVar9 = param_2[2];
uVar10 = param_2[3];
uVar11 = param_2[4];
uVar12 = param_2[5];
uVar15 = param_2[6];
uVar16 = param_2[7];
param_2 = param_2 + 8;
*param_1 = uVar7;
param_1[1] = uVar8;
param_1[2] = uVar9;
param_1[3] = uVar10;
param_1[4] = uVar11;
param_1[5] = uVar12;
param_1[6] = uVar15;
param_1[7] = uVar16;
param_1 = param_1 + 8;
}
do {
param_4 = *param_2;
uVar7 = param_2[1];
uVar8 = param_2[2];
uVar9 = param_2[3];
uVar10 = param_2[4];
uVar11 = param_2[5];
uVar12 = param_2[6];
uVar15 = param_2[7];
param_2 = param_2 + 8;
bVar17 = 0x1f < uVar14;
uVar14 = uVar14 - 0x20;
*param_1 = param_4;
param_1[1] = uVar7;
param_1[2] = uVar8;
param_1[3] = uVar9;
param_1[4] = uVar10;
param_1[5] = uVar11;
param_1[6] = uVar12;
param_1[7] = uVar15;
param_1 = param_1 + 8;
} while (bVar17);
}
if (uVar14 >> 4 & 1) {
param_4 = *param_2;
uVar7 = param_2[1];
uVar8 = param_2[2];
uVar9 = param_2[3];
param_2 = param_2 + 4;
*param_1 = param_4;
param_1[1] = uVar7;
param_1[2] = uVar8;
param_1[3] = uVar9;
param_1 = param_1 + 4;
}
if (uVar14 << 0x1c < 0) {
param_4 = *param_2;
uVar7 = param_2[1];
param_2 = param_2 + 2;
*param_1 = param_4;
param_1[1] = uVar7;
param_1 = param_1 + 2;
}
puVar1 = param_1;
puVar2 = param_2;
if (uVar14 >> 2 & 1) {
puVar2 = param_2 + 1;
param_4 = *param_2;
puVar1 = param_1 + 1;
*param_1 = param_4;
}
uVar6 = param_4;
if ((uVar14 & 3) != 0) {
bVar18 = uVar14 >> 1 & 1;
uVar14 = uVar14 << 0x1f;
bVar17 = uVar14 < 0;
puVar4 = puVar2;
if (bVar18) {
puVar4 = puVar2 + 2;
uVar6 = *puVar2;
}
puVar2 = puVar4;
if (bVar17) {
puVar2 = puVar4 + 1;
uVar14 = *puVar4;
}
puVar4 = puVar1;
if (bVar18) {
puVar4 = puVar1 + 2;
*puVar1 = uVar6;
}
puVar1 = puVar4;
if (bVar17) {
puVar1 = puVar4 + 1;
*puVar4 = uVar14;
}
return CONCAT44(puVar2,puVar1);
}
return CONCAT44(puVar2,puVar1);
}
bVar17 = 3 < param_3;
param_3 = param_3 - 4;
if (bVar17) {
param_2 = param_2 - param_4;
in_r12 = *param_2;
puVar2 = param_1;
if (param_4 == 2) {
do {
puVar1 = param_2;
param_4 = in_r12 >> 0x10;
param_2 = puVar1 + 1;
in_r12 = *param_2;
bVar17 = 3 < param_3;
param_3 = param_3 - 4;
param_4 = param_4 | in_r12 << 0x10;
param_1 = puVar2 + 1;
*puVar2 = param_4;
puVar2 = param_1;
} while (bVar17);
param_2 = puVar1 + 6;
}
else if (param_4 < 3) {
do {
puVar1 = param_2;
param_4 = in_r12 >> 8;
param_2 = puVar1 + 1;
in_r12 = *param_2;
bVar17 = 3 < param_3;
param_3 = param_3 - 4;
param_4 = param_4 | in_r12 << 0x18;
param_1 = puVar2 + 1;
*puVar2 = param_4;
puVar2 = param_1;
} while (bVar17);
param_2 = puVar1 + 5;
}
else {
do {
puVar1 = param_2;
param_4 = in_r12 >> 0x18;
param_2 = puVar1 + 1;
in_r12 = *param_2;
bVar17 = 3 < param_3;
param_3 = param_3 - 4;
param_4 = param_4 | in_r12 << 8;
param_1 = puVar2 + 1;
*puVar2 = param_4;
puVar2 = param_1;
} while (bVar17);
param_2 = puVar1 + 7;
}
}
}
bVar13 = in_r12;
bVar5 = param_4;
bVar18 = param_3 >> 1 & 1;
param_3 = param_3 << 0x1f;
bVar17 = param_3 < 0;
if (bVar18) {
pbVar3 = param_2 + 1;
bVar5 = *param_2;
param_2 = param_2 + 2;
bVar13 = *pbVar3;
}
puVar2 = param_2;
if (bVar17) {
puVar2 = param_2 + 1;
param_3 = *param_2;
}
if (bVar18) {
pbVar3 = param_1 + 1;
*param_1 = bVar5;
param_1 = param_1 + 2;
*pbVar3 = bVar13;
}
puVar1 = param_1;
if (bVar17) {
puVar1 = param_1 + 1;
*param_1 = param_3;
}
return CONCAT44(puVar2,puVar1);
}
These look, at first blush, pretty close to one another. In both cases what this function looks like is something like a compiler-optimized memcpy
, implemented in such a way as to exploit whatever common alignment the source and destination pointer might have.
Now, as far as machine code goes, Ghidra’s decompiler is already quite good, and there’s no real need to put a rather opaque and heuristic LLM in its place. Where LLM-driven approximate decompilations can be quite useful is when dealing with a bytecode for which a good decompiler isn’t immediately available. Another researcher on the Tenable Zero Day team, Jimi Sebree, was able to coax ChatGPT into producing reasonably useful (if imperfect) decompilations of Lua bytecode while reversing a certain router’s LuCI front-end. This took us from something like this:
To something like this:
module("luci.controller.admin.access_control", package.seeall)
local uci = require("luci.model.uci")
local controller = require("luci.model.controller")
local sys = require("luci.sys")
local form = require("luci.tools.form")
local debug = require("luci.tools.debug")
local client_mgmt = require("luci.model.client_mgmt")
local checktypes = require("luci.model.checktypes")
local cur = uci.cursor()
form = form.Form(cur, {"mac"})
local json = require("luci.json")
local http = require("luci.http")
local util = require("luci.util")
local ctl = controller.Controller()
local mgmt = client_mgmt.Client_mgmt()
function index()
entry({"admin", "access_control"}, call("action_index"), _("Access Control"), 3)
entry({"admin", "access_control", "details"}, call("action_details"), nil)
entry({"admin", "access_control", "details", "add"}, call("action_details_add"), nil)
entry({"admin", "access_control", "details", "add_from_pool"}, call("action_details_add_from_pool"), nil)
entry({"admin", "access_control", "details", "delete"}, call("action_details_delete"), nil)
end
function action_index()
local mac_list = mgmt:get_mac_list()
http.prepare_content("application/json")
http.write_json(mac_list)
end
function action_details()
local mac = http.formvalue("mac")
if not mac then
return
end
http.prepare_content("application/json")
http.write_json(mgmt:get_mac_details(mac))
end
As far as machine code binaries go, however, it seems that Ghidra already has decompilation well in hand. What the Ghidra framework could use, it seems, is a way of providing high-level explanations of what a particular decompiled function is doing. And this is just the sort of paraphrasing at which large language models excel.
GPT-3’s Not Bad at Commentary
So now here comes the interesting part. Let’s go back to that function we dumped from the Canon ImageClass printer at address 0x44b2d4b0, take the decompiled code that Ghidra gave us, and ask the LLM to provide it with a high-level gloss, explaining in “natural language” just what the code is doing.
>>> print(prompt_openai_model(f"""Here is some decompiled C code:```
{d}
```
Give me a high-level explanation of what this code is doing, and suggest a name for the function in question."""))
This code is performing a memory copy operation. It is copying data from one memory location to another. The code is looping through the source memory location, and copying each byte to the destination memory location. It is also taking into account any alignment issues that may arise.A suitable name for this function could be 'memory_copy'.
This is a pretty astute bird’s-eye view. The function is, indeed, a memory copying function, optimized in such a way as to exploit any alignment the memory pointers share.
To recap, we’ve observed how the workflow of a software reverse engineer involves (among other things) a series tasks that can be loosely grouped together as operations of translation or paraphrase. These include, but are not limited to,
- data carving and the parsing of filesystems and files
- disassembly
- decompilation
- annotation and commentary
The modern-day reverse engineer is equipped with tools that can automate the first three tasks — albeit never in a foolproof fashion, and the reverser who relies entirely on their automated toolbox is no reverser at all. That the abstractions we deal in deceive us is something reverse engineers take for granted, after all, and this goes for the abstractions our tools employ no less than the abstractions our targets use.
Introducing G-3PO
What these quick and dirty experiments with an LLM suggest is that that the fourth process listed here, the paraphrase of disassembled or decompiled code into high-level commentary, can be assisted by automated tooling as well.
And this is just what the G-3PO Ghidra script does.
The output of such a tool, of course, would have to be carefully checked. Taking its soundness for granted would be a mistake, just as it would be a mistake to put too much faith in the decompiler. We should trust such a tool, backed as it is by an opaque LLM, far less than we trust decompilers, in fact. Fortunately reverse engineering is the sort of domain where we don’t need to trust much at all. It’s an essentially skeptical craft. The reverser’s well aware that every non-trivial abstraction leaks, and that complex hardware and software systems rarely behave as expected. The same healthy skepticism should always extend to our tools.
Developing the G-3PO Ghidra Script
Developing the G-3PO Ghidra script was surprisingly easy. The lion’s share of the work was just a matter of looking up various APIs and fiddling with a somewhat awkward development environment.
One of the weaknesses in Ghidra’s Python scripting support is that it’s restricted to the obsolete and unmaintained “Jython” engine, a Python 2.7 interpreter that runs on the Java Virtual Machine. One option would have been to make use of the Ghidra to Python Bridge, a supplementary Ghidra script that lets you interact with Ghidra’s Jython interpreter from the Python 3 environment of your choice, over a local socket, but since my needs were pretty spare, I didn’t want to overburden the project with extra dependencies. All I really needed from the OpenAI Python module after all was an easy way to serialise, send, receive and parse HTTP requests that conform to the OpenAI API. Ghidra’s Jython distribution doesn’t come with therequests
module included, but it does provide httplib
, which is almost as convenient (in an earlier draft, I overlooked httplib
and resorted to calling curl
via subprocess
, a somewhat ugly and insecure solution):
def send_https_request(address, path, data, headers):
try:
conn = httplib.HTTPSConnection(address)
json_req_data = json.dumps(data)
conn.request("POST", path, json_req_data, headers)
response = conn.getresponse()
json_data = response.read()
conn.close()
try:
data = json.loads(json_data)
return data
except ValueError:
logging.error("Could not parse JSON response from OpenAI!")
logging.debug(json_data)
return None
except Exception as e:
logging.error("Error sending HTTPS request: {e}".format(e=e))
return None
def openai_request(prompt, temperature=0.19, max_tokens=MAXTOKENS, model=MODEL):
data = {
"model": MODEL,
"prompt": prompt,
"max_tokens": max_tokens,
"temperature": temperature
}
# The URL is "https://api.openai.com/v1/completions"
host = "api.openai.com"
path = "/v1/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer {openai_api_key}".format(openai_api_key=os.getenv("OPENAI_API_KEY")),
}
data = send_https_request(host, path, data, headers)
if data is None:
logging.error("OpenAI request failed!")
return None
logging.info("OpenAI request succeeded!")
logging.info("Response: {data}".format(data=data))
return data
This is good enough to avoid any dependency on the Python openai
library.
The prompt that G-3PO sends to the LLM is pretty basic, and there’s certainly room to tweak it a little in search of better results. What I’m currently using looks like this:
prompt = """
Below is some C code that Ghidra decompiled from a binary that I'm trying to
reverse engineer.
```
{c_code}
```
Please provide a detailed explanation of what this code does, in {style},
that might be useful to a reverse engineer. Explain your reasoning as much
as possible. {extra}
Finally, suggest suitable names for this function and its parameters.
""".format(c_code=c_code, style=STYLE, extra=EXTRA)
The c_code
interpolated into the prompt is taken from the output of the Ghidra decompiler, for the function the user is currently inspecting. Quite usefully, this output includes any comments, variable names, or type annotations that the user has already added to the code listing, allowing the LLM to build on the user’s work. The exception is the plate comment positioned at the beginning of the function. This is where G-3PO will place its own comments, and if there’s already a comment in that slot it will refuse to clobber it, unless that comment begins with the phrase OpenAI GPT-3 generated comment
.
G-3PO in Action
Here’s how the script looks in action, commenting on the same function we looked at in the previous section, the infelicitously named FUN_44b2d4b0()
, to which G-3PO suggests we should give the name CopyMemory()
:
Here’s what the comment says, so that you don’t have to squint:
OpenAI GPT-3 generated comment, take with a grain of salt:
This code is a function that copies data from one memory location to another. It takes four parameters: two pointers to memory locations (param_1 and param_2), and two integers (param_3 and param_4).
The function first checks if the value of param_3 is greater than 3. If it is, it checks if the first pointer (param_1) is aligned to a 4-byte boundary. If it is not, it copies the first byte from the second pointer (param_2) to the first pointer (param_1), and then checks if the second pointer (param_2) is aligned to a 4-byte boundary. If it is not, it copies the next byte from the second pointer (param_2) to the first pointer (param_1). It then subtracts the number of bytes copied from param_3 and adjusts the pointers accordingly.
The function then enters a loop, where it copies 8 bytes from the second pointer (param_2) to the first pointer (param_1) and adjusts the pointers accordingly. This loop is repeated until the value of param_3 is less than 0x20.
If the value of param_3 is less than 0x20, the function checks if the value of param_3 is greater than 3. If it is, it copies 4 bytes from the second pointer (param_2) to the first pointer (param_1) and adjusts the pointers accordingly. It then checks if the value of param_3 is greater than 1. If it is, it copies 2 bytes from the second pointer (param_2) to the first pointer (param_1) and adjusts the pointers accordingly. Finally, it checks if the value of param_3 is greater than 0. If it is, it copies 1 byte from the second pointer (param_2) to the first pointer (param_1) and adjusts the pointers accordingly.
The function then returns the adjusted pointers.
A suitable name for this function is “CopyMemory”. The variables bearing
default names can be renamed as follows:
— param_1 -> destPtr
— param_2 -> srcPtr
— param_3 -> numBytes
— param_4 -> alignmentModel: text-davinci-003, Temperature: 0.19
A close look at the code does indeed reveal something like an optimized memcpy
function. It differs from a naïve memory copying loop, say
for (i=0; i<numBytes; i++) {
destPtr[i] = srcPtr[i];
}
only in seeking to copy the largest aligned chunks in each iteration, up to a chunk size of 32 bytes. In this fashion, the processor can take advantage of the ARM ldmia
instruction, which lets it read a block of memory into several registers in sequence, and the stmia
instruction, which lets it write from a sequence of registers into a block of memory. Here’s what the assembly code looks like, for example, where the function is copying 32-byte (i.e., 8-word) blocks of memory at once:
LAB_44b3632c XREF[1]: 44b36340(j)
44b3632c 80 f0 d1 f5 pld [r1,#0x80]
44b36330 f8 51 b1 e8 ldmia r1!,{r3,r4,r5,r6,r7,r8,r12,lr}=>LAB_454918fc
44b36334 20 20 42 e2 sub r2,r2,#0x20
44b36338 80 00 52 e3 cmp r2,#0x80
44b3633c f8 51 a0 e8 stmia r0!,{r3,r4,r5,r6,r7,r8,r12,lr}=>DAT_e1a04005
44b36340 f9 ff ff 2a bcs LAB_44b3632c
A Backdoor Lockpick, Revisited
Earlier this year, I posted a detailed analysis of a vulnerable backdoor that I found on every Phicomm router I could get my hands on. At the heart of that analysis was a fine-grained reversing of the telnetd_startup binary that I found in these routers’ firmware, beginning with the version discovered on the Phicomm K2GA router, firmware version 22.6.3.20. Let’s see what G-3PO has to say about a few of the important functions in that binary, unaided by my own comments, corrections, and annotations to the decompiler output (starting from scratch, that is, with the output Ghidra provides).
Here’s a function that the backdoor server calls to check to see if the client has supplied the magic handshake, ABCDEF1234
, when the state machine is in its initial state, or if the client has requested to restart the state machine.
This is what G-3PO had to say about this fairly simple function:
This is certainly correct, though we should note that what’s significant about this function, for the reverse engineer, has more to do with the role it plays in the state machine implementing the backdoor protocol as a whole.
In certain cases, we see that G-3PO makes an incorrect but reasonable guess — such as when it mistakes the function that generates the ephemeral keys for temporarily and permanently unlocking the backdoor for a function that sets up file system paths. The difference, however, is entirely a matter of context, and we can steer our droid — which can’t, after all, see anything outside this single block of code — in the right direction by naming a few of the other functions thatthis function calls.
/* /--------------------------------------------------------------------------------\
|OpenAI GPT-3 generated comment, take with a grain of salt: |
| |
| This code appears to be setting up two different paths, one for permanent |
|storage and one for temporary storage. The function is first setting up a buffer|
|of 88 bytes (auStack_60) and then using sprintf to create two strings, |
|acStack_460 and acStack_260, which are the paths for the permanent and temporary|
|storage, respectively. The function then calls FUN_00401f30 with the buffer as |
|an argument, followed by a call to FUN_00402b28 with the buffer, the permanent |
|path string, and the length of the permanent path string as arguments. This is |
|followed by a call to FUN_00402c28 with the buffer and a constant string as |
|arguments. The function then repeats the same steps for the temporary path |
|string. |
| This function could be named setupPaths(). The variable auStack_60 could be |
|named buffer, acStack_460 could be named permPath, and acStack_260 could be |
|named tempPath. |
| |
|Model: text-davinci-003, Temperature: 0.19 |
\--------------------------------------------------------------------------------/
/
/~\
|oo )
_\=/_
/ \
//|/.\|\\
|| \_/ ||
|| |\ /| ||
# \_ _/ #
| | |
| | |
[]|[]
| | |
/_]_[_\
*/
/* DISPLAY WARNING: Type casts are NOT being printed */
undefined4 FUN_0040182c(void)
{
size_t sVar1;
char acStack_460 [512];
char acStack_260 [512];
undefined auStack_60 [88];
memset(auStack_60,0,0x58);
sprintf(acStack_460,"%s+PERM",&DAT_00414b80);
sprintf(acStack_260,"%s+TEMP",&DAT_00414b80);
FUN_00401f30(auStack_60);
sVar1 = strlen(acStack_460);
FUN_00402b28(auStack_60,acStack_460,sVar1);
FUN_00402c28(auStack_60,&DAT_00414c20);
FUN_00401f30(auStack_60);
sVar1 = strlen(acStack_260);
FUN_00402b28(auStack_60,acStack_260,sVar1);
FUN_00402c28(auStack_60,&DAT_00414c30);
return 0;
}
/* /--------------------------------------------------------------------------------\
|OpenAI GPT-3 generated comment, take with a grain of salt: |
| |
| This code is responsible for generating two MD5 hashes from two strings. The|
|first string is a concatenation of the XORED_MSG_0014b80 variable and the string|
|"+PERM". The second string is a concatenation of the XORED_MSG_0014b80 variable |
|and the string "+TEMP". |
| The function begins by setting an array of 88 bytes to 0. This array is used|
|to store the MD5 context. It then uses sprintf to create two strings, |
|acStack_460 and acStack_260, which are the two strings mentioned above. |
| The function then calls md5_init with the MD5 context array as an argument. |
|This initializes the MD5 context. It then calls strlen to get the length of the |
|first string, acStack_460, and passes it to md5_add along with the MD5 context |
|array and the first string. This adds the first string to the MD5 context. |
| The function then calls md5_digest with the MD5 context array and the |
|address of the DAT_00414c20 variable as arguments. This generates an MD5 hash |
|from the first string and stores it in the DAT_00414c20 variable. |
| The function then calls md5_init again with the MD5 context array as an |
|argument. It then calls strlen to get the length of the second string, |
|acStack_260, and passes it to md5_add along with the MD5 context array and the |
|second string. This adds the second string to the MD5 context. |
| The function then calls md5_digest with the MD5 context array and the |
|address of the DAT_00414c30 variable as arguments. This generates an MD5 hash |
|from the second string and stores it in the DAT_00414c30 variable. |
| This function could be named generateMD5HashesFromStrings. The variables |
|could be renamed as follows: |
| - acStack_460 -> permString |
| - acStack_260 -> tempString |
| - auStack_60 -> md5Context |
| - DAT_00414c20 -> permHash |
| - DAT_00414c30 -> tempHash |
| |
|Model: text-davinci-003, Temperature: 0.19 |
\--------------------------------------------------------------------------------/
/
/~\
|oo )
_\=/_
/ \
//|/.\|\\
|| \_/ ||
|| |\ /| ||
# \_ _/ #
| | |
| | |
[]|[]
| | |
/_]_[_\
*/
/* DISPLAY WARNING: Type casts are NOT being printed */
undefined4 FUN_0040182c(void)
{
size_t sVar1;
char acStack_460 [512];
char acStack_260 [512];
undefined auStack_60 [88];
memset(auStack_60,0,0x58);
sprintf(acStack_460,"%s+PERM",&XORED_MSG_0014b80);
sprintf(acStack_260,"%s+TEMP",&XORED_MSG_0014b80);
md5_init(auStack_60);
sVar1 = strlen(acStack_460);
md5_add(auStack_60,acStack_460,sVar1);
md5_digest(auStack_60,&DAT_00414c20);
md5_init(auStack_60);
sVar1 = strlen(acStack_260);
md5_add(auStack_60,acStack_260,sVar1);
md5_digest(auStack_60,&DAT_00414c30);
return 0;
}
But how do we know that these auxiliary functions are setting up an MD5 hash? Well, we could figure it out the old fashioned way, as I did when I wrote the report on the backdoor earlier this year, by recognizing the constants used in initializing the MD5 struct, or we could just let our droid have a go at it. The droid, it turns out, does a pretty good job:
/* /--------------------------------------------------------------------------------\
|OpenAI GPT-3 generated comment, take with a grain of salt: |
| |
| This code is initializing a data structure used for the MD5 hashing |
|algorithm. The data structure is an array of 5 32-bit unsigned integers, which |
|are stored in the memory pointed to by the parameter 'param_1'. The first |
|element of the array is set to 0, the second element is set to 0x67452301, the |
|third element is set to 0, the fourth element is set to 0xefcdab89, the fifth |
|element is set to 0x98badcfe, and the sixth element is set to 0x10325476. |
| A suitable name for this function could be 'init_md5_state'. The parameter |
|'param_1' could be renamed to 'md5_state', since it is a pointer to the array of|
|integers used to store the state of the MD5 hashing algorithm. |
| |
|Model: text-davinci-003, Temperature: 0.19 |
\--------------------------------------------------------------------------------/
/
/~\
|oo )
_\=/_
/ \
//|/.\|\\
|| \_/ ||
|| |\ /| ||
# \_ _/ #
| | |
| | |
[]|[]
| | |
/_]_[_\
*/
/* DISPLAY WARNING: Type casts are NOT being printed */
void FUN_00401f30(undefined4 *param_1)
{
*param_1 = 0;
param_1[2] = 0x67452301;
param_1[1] = 0;
param_1[3] = 0xefcdab89;
param_1[4] = 0x98badcfe;
param_1[5] = 0x10325476;
return;
}
The droid provides a reasonable description of the main server loop in the backdoor binary, too:
Installing and Using G-3PO
So, G-3PO is now ready for use. The only catch is that it does require an OpenAI API key, and the text completion service is unfree (as in beer, and as insofar as the model’s a black box). It is, however, reasonably cheap, and even with heavy use I haven’t spent more than the price of a cup of coffee while developing, debugging, and toying around with this tool.
To run the script:
- get yourself an OpenAI API key
- add the key as an environment variable by putting
export OPENAI_API_KEY=whateveryourkeyhappenstobe
in your~/.profile
file, or any other file that will be sourced before you launch Ghidra - copy or symlink
c3po.py
to your Ghidra scripts directory - add that directory in the Script Manager window
- visit the decompiler window for a function you’d like some assistance interpreting
- and then either run the script from the Script Manager window by selecting it and hitting the ▶️ icon, or bind it to a hotkey and strike when needed
Ideally, I’d like to provide a way for the user to twiddle the various parameters used to solicit a response from model, such as the “temperature” in the request (high temperatures — approaching 2.0 — solicit a more adventurous response, while low temperatures instruct the model to respond conservatively), all from within Ghidra. There’s bound to be a way to do this, but it seems neither the Ghidra API documentation, Google, nor even ChatGPT are offering me much help in that regard, so for now you can adjust the settings by editing the global variables declared near the beginning of the g3po.py
source file:
##########################################################################################
# Script Configuration
##########################################################################################
MODEL = "text-davinci-003" # Choose which large language model we query
TEMPERATURE = 0.19 # Set higher for more adventurous comments, lower for more conservative
TIMEOUT = 600 # How many seconds should we wait for a response from OpenAI?
MAXTOKENS = 512 # The maximum number of tokens to request from OpenAI
C3POSAY = True # True if you want the cute C-3PO ASCII art, False otherwise
LANGUAGE = "English" # This can also be used as a style parameter.
EXTRA = "" # Extra text appended to the prompt.
LOGLEVEL = INFO # Adjust for more or less line noise in the console.
COMMENTWIDTH = 80 # How wide the comment, inside the little speech balloon, should be.
C3POASCII = r"""
/~\
|oo )
_\=/_
/ \
//|/.\|\\
|| \_/ ||
|| |\ /| ||
# \_ _/ #
| | |
| | |
[]|[]
| | |
/_]_[_\
"""
##########################################################################################
The LANGUAGE
and EXTRA
parameters provide the user with an easy way to play with the form of the LLM’s commentary. Setting style to "in the form of a sonnet"
, for example, gives us results like this:
These are by no means good sonnets, but you can’t have everything.
G-3PO is open sourced and released under an MIT license. You can find the script in Tenable’s public Github repository HERE.
Happy holidays and happy hacking!