Starting to Fuzz with WinAFL

Angelystor
CSG @ GovTech
Published in
11 min readNov 3, 2020

What is Fuzzing?

Fuzzing, or fuzz testing is an automated software testing technique that has been around for a long time. The popularity of fuzzing has greatly increased recently thanks to the accessibility of computing power, development of open-source (aka free) and easy to use fuzzing frameworks such as AFL and libFuzzer, and the increasing complexities of software.

So, what exactly does fuzzing do? These testing techniques involve providing invalid, unexpected, or random data as inputs to a computer program and it is a great way to test programs quickly in an automated fashion to find vulnerabilities in them. A quick search on the web shows many vulnerabilities (ahem, CVEs) that were discovered through fuzzing.

Figure 1. Wow! 50 CVEs in 50 days! Gotta get in on that action!

As a visual person, this is how, I, personally visualise fuzzing:

Figure 2. Poor program (Wikipedia)

In a fuzzing scenario, we’re not rooting for the program to pass its tests. On the contrary, we’re all hoping that it will fail in the form of a crash! A crash would indicate that the program has failed to gracefully handle an error condition due to a certain input which is potentially exploitable or, at the very least, cause a denial of service.

In this article, we’ll go through the basics of fuzzing and the process of fuzzing a closed source library from start to finish using WinAFL.

The Fuzzing Process

Figure 3. Fuzzing flowchart

Before we begin to dive into fuzzing a program, we need to have a basic understanding of what a fuzzer is. The concept is simple — generate any random input, throw it at a program and hope it crashes. This now leads to the most obvious question — where does the input come from? To answer that, allow me to present two different types of fuzzing approaches.

Blind vs Guided

Figure 4. Blind fuzzing vs Guided fuzzing

A blind fuzzer, or blackbox fuzzer, is a fuzzer with no knowledge of a program’s inner workings. It takes a set of test cases and throws them at the program. When the test cases mutate, it does so randomly without any knowledge of how the program reacted to prior similar test cases. While such a fuzzer is simple to write and deploy, it can result in wasted effort or insufficient code coverage due to its sheer randomness. There are, however, many cases where blind fuzzing is useful — for example when the target program is sitting on a remote server and there is no way for the fuzzer to monitor how the program behaves. A good example of a blind fuzzer is Radamsa.

Radamsa is a blind fuzzer that mutates provided content. A string “aaa” sent to Ramdasa could be mutated to “aab”. Repeated invocations to Radamsa using the same input would generate different outputs that were not guided by any prior data.

On the flip side of a blind fuzzer, we have an entirely different beast — a guided fuzzer. Guided fuzzers require knowledge of the program’s implementation in order to function as they generate additional test cases designed to explore as much of the code paths as possible. This requirement implies that the program will need to be instrumented in some way so the fuzzer knows the instructions it is executing. This is done either by recompiling the program with instrumentation enabled or setting up a runtime instrumentation library such as Intel’s PIN or DynamoRIO. The guided fuzzer monitors the program’s execution and records any crashes or hangs and uses that to influence generation of test cases.

It’s Time to Fuzz!

Figure 5. American Fuzzy (source)

Let’s look at our fancy process chart, update it and get ready to fuzz a program!

Figure 6. More steps added to the process flow

Program Selection

The target program here is the 32-bit API DLL provided by SketchUp 2016 — SketchUpAPI.dll. While WinAFL can fuzz both executables and DLLs, we’re going to fuzz a DLL as it’s easier to hook into its functions, and also because there are friendly API documentations available.

Function Selection

Figure 7. WinAFL’s requirement for a target function

The target function needs to be a function that works on user input. It either works on the input directly, eg createPNGFromFile(string filename), or indirectly, eg createTileFromTexture(Img texture); this is dependent on the user supplying an image texture in a prior function.

Since we’re all beginners, we’re opting for a straightforward function which directly takes in user input. In this exercise, we chose the SUCreateImageFromFile function, which takes in a file path, parses it and returns an image representation in memory.

Figure 8. Sketchup API SUImageCreateFromFile

Now that we have picked out our target program and function, the next step is to get WinAFL to execute this function. We can do this by writing a harness for it.

Harness

In most fuzzing cases, a lightweight program is necessary to set up the required structure and complete any initialisation required by the target function. This program is typically known as a harness.

Figure 9. Time to add “Harness” into our flowchart

Before we launch Visual Studio, let’s find out what is required of the harness with WinAFL.

Figure 10. Target program flow

From the list in Figure 10, we can see that WinAFL will execute our program normally until the target function is reached. As fuzzing takes a long time, our aim is to make our harness as efficient as possible by including as little instructions as possible between our harness’ starting point to the target function.

Do also pay special attention to points 4 and 5 in Figure 10 as starting and tearing down a process is an expensive operation. To improve fuzzing throughput, WinAFL has a feature to re-execute the target function multiple times without restarting the entire program. If the target function can prevent multiple invocations from crashing the harness unnecessarily due either to memory exhaustion or change its flow after multiple invocations, we should always aim for as many iterations as possible. This also means that the target function should be stateless.

Now back to our target function SUCreateImageFromFile — we know from the points above, that we need to write a minimal program to invoke it. Luckily for us, this function does not require any other initialisation other than loading the DLL itself.

Figure 11. Code to invoke the target function

The function that invokes SUCreateImageFromFile is creatively named fuzzme. fuzzme is designed to do nothing except invoke the target function. As the SUCreateImageFromFile’s address does not change in the same instance as the program, we can move the loading of the DLL outside the function so that it’s only loaded whenever the program is running, reducing the amount of time required to load it.

When we run the fuzzer, we’re instructing it to instrument our fuzzme function. This will cause WinAFL to instrument everything that happens inside fuzzme which invokes the function we want to test! However, before we do that, there are some attributes that need to be added:

  • dllexport allows our function fuzzme to be exported and WinAFL will subsequently be able to find the function by name rather than the offset address. We prefer to target the function this way as the offset addresses may change if the harness code has been updated. Note that by adding this attribute, the function will no longer be inlined. However, due to the vagaries of compiler optimisation, this may cause some interesting behaviour as detailed in the subsequent section.
  • noinline tells the compiler not to inline the function — this is required when we compile an optimised build. Let’s take a short detour and delve a little more into compiler optimisation.

Compiler Optimisation and All that Jazz

We now encounter the deep and dark topic of compiler optimisation. Let’s just briefly touch on function calling and inline optimisation.

Figure 12. Pseudocode demonstrating function calls

Even though they perform the same function, the main2 function in Figure 12 will execute faster than main in the absence of any compiler optimisations. This is due to the cost of invoking the function add by main.

When calling a function, the processor looks up the function’s address, set up the stack and any other required structures e.g. reset registers etc. When the function exits, the processor executes clean up actions such as restoring the stack and registers. Therefore, if the compiler is set to optimise code, which is a common setting for release builds, it will try to do away with unnecessary calls by inlining functions if it determines that there is a net positive to performance.

So what does this mean for our harness? Let’s take a look at Ida and find out for ourselves.

Figure 13. Relevant part of main that invokes suImageCreateFromFile

“Zounds!” It looks like our call to fuzzme has vanished! The compiler has decided to inline our fuzzme function to save the cost to call it. As nothing else called out to fuzzme, the compiler should have completely removed our function. Thankfully, we have our dllexport attribute and the compiler helped to saved fuzzme.

Figure 14. fuzzme! You’re still here!

At this point, we might conclude that everything is still kosher, and that our fuzzer is ready to go. However, astute readers will realise that this will not work. The fuzzer will never be able to instrument fuzzme in our harness because while the function exists, it is no longer invoked. This is why we need to add the noinline directive to fuzzme so that the compiler will not help to inline it and output the instructions in Figure 15.

Figure 15. Our harness now calls fuzzme

Our harness is now complete and we’ll need to test and confirm if everything is truly kosher before we get fuzzing.

Testing the Harness

It’s always a good practice to make sure that things are working fine before starting the expensive process of fuzzing. Here’s a good practice tip: invoke DynamoRIO to check if it can invoke and instrument our target function. We can do this by running this command:

Figure 16. Fuzzer commands are always way too long… bat files to the rescue!

If everything works fine, drrun will output a log file stating that the function fuzzme has been executed.

It’s almost time to start fuzzing!

Corpus Gathering

Figure 17. But wait! There’s more!

Let’s catch our breaths and revisit our progress: we’ve chosen our program, selected the target function and written our harness. Heck, we even tested if the fuzzer could actually see that function being executed. There are only a few more steps left!

To get good and quick results, we first need to gather a good corpus. A good corpus requires us to gather a good sample of all the types of images that our API call supports. To improve efficiency, we should ensure that the image files are smaller than 1 MB. Lucky for us, we’re dealing with images so there are plenty of online resources (https://lcamtuf.coredump.cx/afl/demo/, https://github.com/uclouvain/openjpeg-data, etc) to acquire a variety of images. Do note that we should collect both proper and malformed images.

Run the following command and collect our code coverage logs:

Figure 18. We love batch scripts

This instructs DynamoRIO to execute our harness and output code coverage log files. Thereafter, we will install Lighthouse — a plugin for Ida to view code coverages and import the log files.

Figure 19. Code coverage in action

Lighthouse colours the executed code blocks in blue and aggregates each function’s coverage as a percentage. Once you’ve determined that enough of the function you want to fuzz is executed, it’s time to move to the next step!

To recap, we’ve already collected thousands of image files. We have done code coverage testing to ensure that we have sufficient coverage. There’s one more step to do before we go fuzzing, and that’s corpus minimisation.

Corpus minimisation reduces the size of the corpus; the larger the corpus is, the longer it takes to fuzz. Therefore, it’s in our best interest to minimise its size as much as possible. Our aim is for our corpus to only contain interesting files. 10 different files that cause the program to execute the same instructions can be reduced to only 1 file.

Figure 20. WinAFL will let you know if test cases are useless

We’ll be running another helpful tool written in Python 2.7 called winafl-cmin.py, which is included in WInAFL’s repo, to minimise the corpus. It will take the entire corpus and create a minimum set which can be used to fuzz.

Finally! The moment we’ve all been waiting for!

Time to fuzz! For real this time!

We now have all the necessary ingredients and it’s now time to fire up the fuzzer! As fuzzing is a long and complicated process, a beefy machine with lots of RAM will come in handy. We also highly recommended to set up a RAM disk for its output as the fuzzer writes a lot of transient files to disk.

Figure 21. It’s alive!

Lastly, the fuzzing process can be parallelised for greater efficiency. Running multiple instances of the fuzzer will cause it to select an idle core. WinAFL includes a handy script to monitor the fuzzing status of all the processes on the system as seen in Figure 22.

Figure 22. Handy tool to see the fuzzing status

2728 unique crashes! Which brings us to…

The Part Nobody Talks About — Triaging

Triaging — the dark and dirty part most articles do not write about. Now that we have our crashes, we need to determine if there are interesting crashes. With over 2000 crashes, it’s evident that there’s no way to do this manually. The quick and dirty way we used was to write a Python script to execute !exploitable, which is a crash categoriser for WinDBG, and separate the crashes into different categories:

Figure 23. There’s 99 of these “EXPLOITABLE” files! Happy triaging!

Even as we were doing so, we realised that our quick and dirty crash “triager” was inadequate. Although WinAFL reported the crash as a unique, it was not as unique as we hoped for our purposes i.e. to find an exploitable crash. While this may suffice for now, our crash triager version 2 will have to take into account the state of the stack, which !exploitable outputs as a hash, to further narrow down unique cases.

Some Key Takeaways

This was the first full cycle fuzzing operation that we conducted and we definitely learnt a lot from the process — learning the nuances of writing a harness and setting up utility scripts to help automate some tasks to triage. It wasn’t an easy task but we look forward to more fuzzing adventures and learning more from the process.

Thanks for reading!

--

--