Everyday Ghidra: Ghidra Data Types — Creating Custom GDTs From Windows Headers — Part 2
Ghidra, developed by the NSA, is a powerful reverse engineering tool known for its versatility. One standout feature is its ability to import data types from various sources, including Windows headers, into Ghidra Data Type (GDT) files. This guide will walk you through the process of creating GDT files from Windows headers.
In an ideal world for reverse engineering, every function would have a name, and every variable would be correctly typed. However, we don’t live in this ideal world. While Ghidra strives to recognize data types within a binary and assign appropriate types through import heuristics, auto analysis, and known function prototypes, there are instances where it falls short. It’s in these moments that the reverse engineer must step in.
In part one, we clearly defined GDTs and the importance of data types for reverse engineering.
We walked through the process of reverse engineering CVE-2024-38063 and found something disappointing: Ghidra’s decompilation of Window’s tcpip.sys
fell short. This highlighted the need to build a Ghidra GDT.
In this post, we will learn how to define custom data types. We will define NDIS data types using readily available Windows NDIS headers from the Windows Driver Kit and use them to build a custom GDT. Using these headers, we can transform the decompilation into something respectable.
Ghidra’s C Header Parser
Ghidra has a native ability to preprocess headers to extract and create new data types. Ghidra already provides a default Windows GDT with around 90,000 types that we reviewed in part 1, but if you want to build your own Windows default GDT you can do it with Ghidra’s C-Parser. Within the Codebrowser, click File → Parse C Source. This will bring up the “Parse C Source” dialog window.
From this dialog box, Ghidra can parse header files directly into the currently open program or create a .gdt
file containing all discovered data types using the “Parse to File…” option. Numerous default configuration options are available, and as we will explore later, the process can become complex. Various recommended parse configurations populate the three input sections with different defaults for the include paths, parse options, and program architecture. Ghidra leverages these headers and definitions to preprocess the headers and extract type information.
Building Ghidra’s Default Visual Studio 2022 Windows GDT
The default parser configuration shown above is what Ghidra uses to build the default GDT for Windows. However, on my host machine, you see several red indicators, meaning many files are missing.
This can be fixed. On my Windows host, I have the Windows SDK 10.0.22621.0 installed (not the default 10.0.19041.0). This is fine; you can use whichever version you have installed. All you need to do is add the relative include directories.
After adding the existing include directories, the header files are no longer red.
Once Ghidra has visibility of all the headers, you can parse the headers to the current open program, or save to a new .gdt
file.
Click on Parse to File…
C-Parser Woes
After preprocessing most of the files, the parser ran into an issue.
This is a common issue. Ghidra’s C-Parser is very particular about the order of which header files are included and will often struggle with headers containing strange or undefined directives. It usually takes a few iterations to get it right.
Ghidra even acknowledges that parsing new headers can be challenging, but assures that the effort is worth it. It’s far better than manually adding hundreds of new data types.
To troubleshoot a parsing error, open the <outputfile>_CParser.out
file and check the line number (in this case, 5454). This .out
CParser file is generated in the same directory where you chose to save the output GDT.
This issue was because I forgot to include one of the necessary include directories containing a header with the definition for _CRT_BEGIN_C_HEADER
. After fixing that, the parsing worked. If you want a copy of the GDT download it here.
OK, that was a softball. Parsing headers with Ghidra’s default parse configuration works pretty well. What about parsing a new header, like our needed NDIS data types??
Building a Custom GDT from Windows Headers
Say we wanted to add all the data types from a large library like the Network Driver Interface Specification (NDIS) library? It would take quite some time to manually create data types for 100 –1000s of structures, enums, and functions. — When to Create Custom GDTs
Most of the data types we were missing in our previous analysis came from NDIS. The NDIS header is available from the Windows Driver Kit.
As this file is already in the list of include directories that we defined earlier, all we need to do is use the same parse configuration and simply add the base NDIS header (ndis.h) from the Windows Driver Kit.
I attempted this and came up with another parsing error.
Depending on the complexity of the headers added, this process can take some time. Iteration might be required: defining the unknowns, modifying the header, and attempting the parse again. After several iterations, you may succeed, fail, or simply give up in frustration.
Luckily, we can actually workaround this seemingly endless process.
Pre-Pre-Processing C Headers
Instead of trying to create the perfect parser configuration and providing all the headers in the correct order, we can take another approach. Compiler chains like MinGW or MSVC from Visual studio can preprocess the headers before submitting them to Ghidra’s C-Parser. This method leverages a compiler’s superior ability to perform macro substitution, apply conditional compilation directives, and handle file inclusion more effectively than Ghidra’s native C-Parser preprocessing.
Using your compiler of choice, you can create a simple source file that includes all the headers you would like to preprocess. We can create a simple source file like ndis-headers.c:
// add all headers you want to preprocess
#include <ndis.h>
Then direct the compiler to preprocess the source to a file.
cl /P /I '.\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\km' ndis-headers.c /D _WIN64 /D _AMD64_ /D AMD64 /showIncludes
I learned this technique from two GitHub repositories created by fellow reverse engineers.
The first repository uses MinGW, a native Windows port of GCC, and its available headers to create both 32-bit and 64-bit preprocessed headers for building several powerful GDTs. You can find it here:
The second repository demonstrates how to achieve the same result using Visual Studio. This repository takes all the headers from Process Hacker (now known as System Informer) and creates a preprocessed GDT using the well-defined headers from Process Hacker.
Here is the step-by-step technique Visual Studio in detail:
An alternative to parsing Windows headers directly is to deal with preprocessed files:
1. Create a project using Visual Studio (e.g. VS2019).
2. Create a source file which includes all of the headers you’re interested in capturing definitions for.
3. Set the compile options for “Preprocess to a File (/P)” and “Preprocess Suppress Line Numbers (/EP)”.
4. Compile your project. Linking will fail, but you’ll end up with a single .i file e.g. main.i which will have all of the preprocessed structs, enums, function prototypes, etc.
5. Rename this to main.h or something appropriate for your project.
6. Clean up this file . See Cleaning up Preprocessed Output for an overview of this process.
7. Follow the steps in Generating Ghidra Data Type (GDT) Archives to create a .gdt file.
Creating Preprocessed Files in Visual Studio 2022
Now, we will follow the steps to create a preprocessed header in Visual Studio 2022. We start by creating a new project.
First, find a template C++ project that deals with NDIS (Why NDIS? See part 1). This will help Visual Studio set up a project with the correct include directories for kernel versus user mode headers.
Next, create a new project named ndis-headers. (Step 1)
Add a new source file ndis-headers.c and add desired headers. (Step 2):
Then, modify the project properties to configure the compiler with the /P
option to preprocess source files and the /EP
option to suppress line numbers (Step 3).
Compile the header to force the creation of a preprocessed ndis-headers.i (Step 4):
This right-click -> Compile from Visual studio translates to this long command:
cl /c /I.. /I. /I.. /I. /IX64\DEBUG\ /Zi /nologo /W4 /WX /diagnostics:column /Od /Ob2 /Oi /Oy- /D _WIN64 /D _AMD64_ /D AMD64 /D DEPRECATE_DDK_FUNCTIONS=1 /D MSC_NOOPT /D _WIN32_WINNT=0x0A00 /D WINVER=0x0A00 /D WINNT=1 /D NTDDI_VERSION=0xA00000C /D DBG=1 /D NDIS_WDM=1 /D NDIS630=1 /D NDIS_WDM=1 /P /EP /GF /Gm- /Zp8 /GS /guard:cf /Gy /fp:precise /Qspectre /Zc:wchar_t- /Zc:forScope /Zc:inline /GR- /Yc"precomp.h" /Fp"X64\DEBUG\NDIS-HEADERS.PCH" /Fo"X64\DEBUG\\" /Fd"X64\DEBUG\VC143.PDB" /external:W4 /Gz /wd4748 /wd4603 /wd4627 /wd4986 /wd4987 /wd4201 /wd4214 /FI"C:\PROGRAM FILES (X86)\WINDOWS KITS\10\INCLUDE\10.0.22621.0\SHARED\WARNING.H" /FC /kernel -cbstring -d2epilogunwind /d1import_no_registry /d2AllowCompatibleILVersions /d2Zi+ C:\USERS\USER\SOURCE\NDIS-HEADERS\NDIS-HEADERS.C
But the only functionality we really needed was the preprocessing flag /P
and the include flag /I
pointing to the directory with the driver headers:
cl /P /I ‘.\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\km’ /D _WIN64 /D _AMD64_ /D AMD64 ndis-headers.c
This generates the ndis-headers.i file:
The direct output looks like:
We could attempt to throw the generated preprocessed file directly into Ghidra’s C-Parser, but we need to complete (Step 6):
6. Clean up this file . See Cleaning up Preprocessed Output for an overview of this process.
This cleanup can be accomplished many ways using regex. For simplicity and to give you the ability to repeat the work I created this Cyberchef recipe.
Just copy/paste your preprocessed header or upload the file into the input:
The resulting clean output has inlined function bodies and other statements that might break the Ghidra C-Parser removed. After we have our clean header, we can now use it to finally create an NDIS GDT. (Step 7)
Generate the GDT File
I left all the #defines
from the original parser configuration but removed all other source files and include directories. Then, I added the clean NDIS header.
I selected “Don’t Use Open Archives”, because I didn’t want my GDT to depend on other source GDTs and I wanted it to only contain NDIS related data types.
In the end it, it worked!
Clear Lenses — Using the GDT File
To use the GDT file in a project, open the Data Type Manager and click the drop-down arrow.
Select Open File Archive and choose your GDT file.
The data types from the GDT file will now be available in your project.
To apply the data types Right-click ndis_64 -> Apply Data Types:
Let’s see how much cleaner our decompilation is now.
We are now seeing structures, named variables, and data types we weren’t even aware of in the analysis from part 1!
Check out the difference before and after the GDT application.
This is a direct result of training Ghidra to use all the available NDIS data types by leveraging our newly generated GDT!
Go ahead and download this NDIS GDT to improve your analysis or check out the Visual Studio solution and build your own. Let me know if you run into issues.
Summary
By following these steps, you can create Ghidra GDT files from Windows headers, making it easier to analyze binaries that use these data types. Preprocessing headers using a compiler ensures compatibility with Ghidra’s parser, allowing for a more accurate and efficient analysis of binaries. This method enhances your ability to identify and utilize critical data types, significantly improving your decompilation results.
Feel free to ask if you have any questions or run into issues trying to build your own preprocessed header.
Reach out to clearbluejar on X or Mastadon.
Going Deeper
For a deeper dive into my research and long form writing, head over to clearbluejar.github.io.
If you’re looking for more hands-on guidance with reverse engineering or enjoy tackling practical RE challenges, check out my upcoming in-person and virtual courses @clearseclabs .