A Basic Guide to AFL QEMU

Craig Young
7 min readApr 28, 2024

--

Over the years that I’ve been teaching Ghidra at Black Hat and other events, there is one question which inevitably comes up.

How do you find vulnerabilities in the programs you analyze with Ghidra?

That is why I am doing something a little different for 2024. Instead of focusing on the mechanics of using Ghidra, this year’s class will focus on finding vulnerabilities in the programs we analyze. This means both finding vulnerabilities directly within our Ghidra analysis as well as integrating other tools using data and insights gleaned from analysis. You can see more details on the Black Hat listing for A Basic Guide to Bug Hunting with Ghidra.

Fuzzing Compiled Code

Today’s post takes a simplified example from the class and explains a process for fuzzing a parser function identified within a compiled program. For the sake of brevity, we’ll be jumping ahead to the point at which we’ve identified an interesting function and have inferred a function signature. For context, the program listens for an HTTP POST request on port 8080 and it passes the request body data to an internal parsing function. The data is parsed as a proprietary game save format. The HTTP response is a summary of what the parser decoded.

The example program, and a sample input are available for download through GitHub:

Files are available from: https://github.com/cy1337/afl_qemu_target
A build of the Dockerfile is also available at: https://hub.docker.com/r/cy1337demos/simple_target_server

· simple_target_server
· post_data.bin
· Dockerfile

The simple_target_server was compiled on Ubuntu 20.04 but will likely run on a few other platforms as well. If you have problems, you can pull a pre-loaded image from Docker Hub:

docker pull cy1337demos/simple_target_server

You can then start the target server with:

docker run -it cy1337demos/simple_target_server:latest bash
./simple_target_server

Demo Program

With simple_target_server running, you can use a separate terminal session to test the send the post_data.bin payload using curl as follows:

$ curl --data-binary @post_data.bin localhost:8080
Level: 5
Health: 300
Points: 1500
Item Count: 3
Items: 01 02 03

As you may have guessed from the title, we are going to now use American Fuzzy Lop (AFL) to target the parsing logic for this custom save game format. I was introduced to AFL sometime around 2015 and it quickly became one of my favorite analysis tools due to its speed and effectiveness. The normal operation of AFL is to compile the program with instrumentation. The instrumentation logs the selection and order or basic blocks which execute as coverage feedback for the fuzzer. The fuzzer then iterates over a list of known inputs which make up the test corpus. During a fuzz, these input files are mutated and run through the code under test. Inputs which produce unique code coverage are added to the list of files for future mutation. In this way, AFL creates generations of test cases where each generation demonstrates an “interesting” modification compared to its parent.

Besides this traditional mode with source-instrumentation, AFL and the newer AFLplusplus fork can also integrate with various emulators to obtain coverage feedback. In this post, we will use AFL’s QEMU mode.

To begin, you will need to either load the linked container or install some dependencies:

Be sure to run qemu_mode/build_qemu_support.sh as shown in the Dockerfile.

Out of the box, AFL’s QEMU mode is setup to fuzz a program which receives data (either from a file or from stdin) and does some processing on that data. The simple_target­_server example cannot be directly fuzzed with AFL QEMU because the data is consumed from a network socket. While there are a variety of solutions to this problem, the approach I am taking in this post is to turn our PIE compiled executable into a shared library with an exported symbol for the function we want to fuzz and then building a wrapper to fuzz it.

The function of interest in simple_target_server is found in Ghidra as FUN_001014d1 which, after analysis, was determined to have a function signature compatible with:

char * FUN_001014d1(char *param_1,size_t param_2)

From here, the high-level steps are as follows:

1. Export FUN_001014d1 in simple_target_server as fuzz_target to make libfuzz.so

2. Create and compile a harness in C to feed data from stdin to fuzz_target

3. (If needed) patch ELF header to remove PIE flag on libfuzz.so and relink

4. Fuzz using afl-fuzz -Q with AFL_INST_LIBS=1 and LD_LIBRARY_PATH=$(pwd)

5. Test crashing payloads against original simple_target_server

Steps #1–2 can be achieved using the Library to Instrument Executable Files which offers a helpful Python interface. For more background on how this works, I recommend looking at the Python LIEF module documentation.

Exporting the Symbol

With the LIEF Python module installed, it is pretty easy to add an export symbol at an arbitrary address:

>>> import lief
>>> x = lief.parse('simple_target_server')
>>> x.add_exported_function(0x14d1, "fuzz_target")
<lief.ELF.Symbol object at 0x7f2a23049ef0>
>>> x.write('libfuzz.so')

We want to export a symbol for Ghidra’s FUN_001014d1. Per the name, Ghidra places this at 0x1014d1 but since Ghidra starts numbering at 0x100000, the offset we want to export is actually 0x14d1.

Fuzz Harness

The next thing we need is to construct our fuzz harness. A fuzz harness is the tool which sets up and executes our interesting function. For this function we can do (also on GitHub):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUFFER_SIZE 1024
extern char * fuzz_target(char *param_1,size_t param_2);

int main() {
char buffer[BUFFER_SIZE] = {0};

// Read data from stdin
size_t data_length = fread(buffer, 1, BUFFER_SIZE-1, stdin);
char* output = fuzz_target(buffer, data_length);
if (output) {
printf("Processed output:\n%s\n", output);
// The function allocated memory on the heap
// Let's be polite and free it in case we do a persistent fuzz later
free(output);
} else {
printf("Function returned NULL\n");
}
return EXIT_SUCCESS;
}

This can be compiled as follows from within the same directory as libfuzz.so:

gcc -o harness harness.c -L$(pwd) -lfuzz

On some platforms this will give an immediate error similar to:

/usr/bin/ld: cannot use executable file '/usr/src/app/libfuzz.so' as input to a link

On other platforms it may compile and link but running the program is likely going to generate an error:

$ LD_LIBRARY_PATH=$(pwd) ./harness
./harness: error while loading shared libraries: libfuzz.so: cannot dynamically load position-independent executable

Removing the PIE Flag

This outcome is referenced in the LIEF documentation as a compatibility issue with newer glibc. The steps listed for fixing this however don’t work for me on a current version of the Python module:

>>> import lief
>>> bin_ = lief.parse('libfuzz.so')
>>> bin_[lief.ELF.DynamicEntry.TAG.FLAGS_1].remove(lief.ELF.DynamicEntryFlags.FLAG.PIE)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'lief.ELF.DynamicEntry' has no attribute 'TAG'

Although this technique didn’t work, I was able to come up with another way. The following Python script shows how to remove the flag (it’s also on GitHub):

#!/usr/bin/env python3

import sys
import lief
import argparse

def remove_pie_flag(filename, output_filename=None):
# Load the ELF binary
binary = lief.parse(filename)
if not isinstance(binary, lief.ELF.Binary):
print("The file is not an ELF executable.")
return

# Attempt to find the DT_FLAGS_1 entry
flags_1_entry = None
for entry in binary.dynamic_entries:
if entry.tag == lief.ELF.DYNAMIC_TAGS.FLAGS_1:
flags_1_entry = entry
break

if flags_1_entry is None:
print("DT_FLAGS_1 entry not found. This file may not have PIE set or lacks dynamic flags.")
return

# Check if the DF_1_PIE flag is set and remove it
if flags_1_entry.value & lief.ELF.DYNAMIC_FLAGS_1.PIE:
print("PIE flag is present. Removing...")
# Remove the PIE flag by clearing the bit
flags_1_entry.value ^= lief.ELF.DYNAMIC_FLAGS_1.PIE
else:
print("PIE flag is not present. No changes needed.")
return

# Save the modified binary
if output_filename is None:
output_filename = filename
binary.write(output_filename)
print(f"Modified file saved as {output_filename}")

def main():
parser = argparse.ArgumentParser(description="Remove the PIE flag from an ELF binary.")
parser.add_argument("filename", type=str, help="The filename of the ELF file to modify.")
parser.add_argument("--output", type=str, help="Optional: The filename to save the modified ELF file. If not provided, it overwrites the original file.")

args = parser.parse_args()

remove_pie_flag(args.filename, args.output)

if __name__ == "__main__":
main()

This tool can be applied to our libfuzz.so as follows:

$ python3 ./remove_pie.py libfuzz.so --output libfuzz.so
PIE flag is present. Removing…
Modified file saved as libfuzz.so

The harness program will work now (with LD_LIBRARY_PATH set appropriately). We can test it by feeding our sample post data as follows:

$ LD_LIBRARY_PATH=$(pwd) ./harness < post_data.bin
Processed output:
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 64
Level: 5
Health: 300
Points: 1500
Item Count: 3
Items: 01 02 03

Fuzz Time

With our fuzz harness now taking user-input and passing it directly to our fuzzed function, we can proceed with using afl-fuzz in QEMU (-Q) mode.

Start by creating a new directory, in, and place the provided post_data.bin file in this directory. When running AFL, we also need to remember to set the LD_LIBRARY_PATH to find our libfuzz.so and we need to have AFL_INST_LIBS=1 set so that AFL/QEMU will instrument blocks executing from external libraries (e.g. libfuzz.so).

$ AFL_INST_LIBS=1 LD_LIBRARY_PATH=$(pwd) afl-fuzz -Q -i in -o out -- ./harness

If everything has gone according to plan, you should see AFL’s interface log crashes almost immediately:

AFL QEMU Fuzz

The crashing testcases are stored in out/default/crashes/id*.

Testing the Payload

Fuzzing a function outside of the normal context can sometimes yield false positive results so it is important to test these payloads with our original simple_target_server. Run the server and then, from a separate terminal, send the data via curl:

$ curl --data-binary @out/default/crashes/id\:000000* localhost:8080

The server should produce output as follows:

$ ./simple_target_server
Listening on port 8080…
*** stack smashing detected ***: terminated
Aborted

Going Further

Although this blog post is done, the learning doesn’t have to stop here. In the classroom we will build on this foundation with techniques for working around or working through various complications such as dependencies on external data structures, integrity checked data streams, and more sophisticated runtime links. If you are interested in this or other techniques for identifying vulnerabilities in compiled code, please consider joining me this summer in Vegas for Black Hat. This course will be offered August 3–4 (Sat/Sun) and again August 5–6 (Mon/Tue). Hope to see you there!

--

--

Craig Young

I’m a 15-year veteran of the infosec industry with 200+ CVEs, two USENIX papers, a Pwnie award, and a bunch of bounties to my name. Currently teaching Ghidra.