What is Fuzzing?
Standard Glossary of Software Engineering Terminology, IEEE, defines
The degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions.
Breaking it down into simpler terms,
fuzzing is a testing technique for applications in which we pass random, invalid input to the target application. The application is then monitored for unexpected behavior. The unexpected behavior could be the application crashing, memory leakage, etc. that occur for previously unknown niche test cases that go beyond the scope of manual testing.
One thing to keep in mind is that invalid inputs are supposed to be valid enough that they are accepted by the target application for processing and don’t make the application crash right away. Their task is to help us find the hidden exceptions that are yet to be found within the application.
Fuzzers are tools that aid in automating the fuzzing process and allow us to fine-tune various parameters according to the application being targeted.
Fuzzers can be broadly classified into the following types:
Mutational Fuzzers are often called dumb fuzzers. It is so because they try to randomize the input by mutating or changing the seed input cases provided for fuzzing randomly.
Grammar Fuzzers are the ones where we define how we want the input to be changed by defining rules about how to mutate the seed input.
Feedback-based Fuzzers are smart fuzzers which observe how a particular input affects the target binary and then mutates the seed input accordingly to optimize fuzzing.
Why Fuzz Applications?
Fuzzing, as mentioned earlier, is a testing process to find bugs in an application. Hence, the first goal of fuzzing is to find bugs in the target application that lay outside the scope of manual testing by a human.
Fuzzing also helps to develop a robust approach to development by incorporating new development strategies to avoid niche bugs found previously by fuzzing of other applications. This can save time and effort in correcting an issue that can be avoided entirely as it has already been observed elsewhere, for example, in a previous iteration of the same application.
A lot of software vendors also run responsible disclosure programs where one can disclose previously unknown bugs and get rewarded. So, spending some time on finding bugs with automated fuzzers can also make you money by just running a fuzzer on such an application.
American Fuzzy Lop
American Fuzzy Lop, or AFL for short, is a smart fuzzer. It mutates the seed input, given at the start of fuzzing, to generate new test cases which it thinks will lead to the discovery of new paths.
Before I explain the above statement, let me introduce you to two terms -
code coverage and
path coverage. Code Coverage refers to the amount of code that was triggered by a particular test case. Path Coverage refers to the number of potential sequence of code statements (or paths) that were triggered by a test case.
Let’s take an example. Refer the pseudo-code below:
if <condition 1>:
# Statementsif <condition 2>:
In the above pseudo-code, code coverage could be 100% when different test cases trigger the conditional statements separately. Path Coverage, however, would mean how many of the total execution paths were covered. For this example, the ways in which the conditional statements can be triggered are both,
condition 1 and
condition 2 individually and then one case when they execute in the same test run sequentially. That gives us a total of 3 paths. Assuming our test cases only trigger the two conditions individually, we’ll get 100% code coverage whereas the path coverage is only 2/3 as we do not have a test case that triggers both conditional statements in the same test run.
Coming back to the statement we began this section with, AFL takes a set of files (or test cases) that will serve as the seed input to start fuzzing the target. AFL then interacts with the target binary, while it’s processing the input passed to it, and monitors what segment of code was triggered in what sequence i.e. it keeps track of code paths being triggered. Based on the paths being triggered, it mutates the seed files to trigger new code paths, thus increasing
The creator of American Fuzzy Lop, Michal Zalewski, wrote a blog describing how smart AFL is. He starts off by creating a file with the word “hello” in it and then tries to pass that as seed input to an application that expects a JPEG image. After initial crashes, AFL figures out what the application is expecting and starts mutating the seed input to produce valid JPEG image files, from what initially was a text file. This demonstrates how impressive AFL is. Do give Michal’s blog a read as it explains in great detail how AFL ended up pulling JPEGs out of thin air.
Fuzzing with AFL
AFL is extremely easy to use, as we shall see. There’s a set of steps that we need to go through before unleashing AFL on an application.
Here’s AFL’s workflow in brief:
- Compiling the binary for the target application with AFL’s compilers to instrument it.
- Building a test corpus (seed test cases) to start the fuzzing process.
- Running AFL on the
instrumented binaryof the target application.
- Lastly, analyzing results.
Installing AFL is quite straight-forward but before we install it, we need to have some prerequisite installed on our system.
Note: This setup was tested with Ubuntu 16.04
Let’s start with installing the prerequisites. Follow the commands below to install
sudo apt install gcc
sudo apt install clang
Now we’ll install AFL with the following commands:
tar -xzvf afl-latest.tgz
sudo make install
AFL comes with multiple compilers to instrument binaries (we’ll talk about what instrumenting a binary means) which include the traditional
gcc as well as
clang and hence, even though after the previous step we’re ready to fuzz applications but the default
gcc compiler that comes with AFL is slower compared to other upstream compilers that come with it. AFL leverages
LLVM capabilities to make the fuzzing process faster. We can enable LLVM mode in AFL with the following commands:
sudo apt-get install llvm-dev llvm
sudo make install
Setting up a Target
Now, we’re ready to fuzz applications, all we need is a target. We’ll use
fuzzgoat, which is an intentionally vulnerable application written to demonstrate fuzzing. We can clone
fuzzgoat from its repository as follows:
git clone https://github.com/fuzzstati0n/fuzzgoat
Compiling with AFL
We start off by compiling the binary for the target application with AFL’s compilers. This is necessary because it allows AFL to add some additional code in the compiled binary which allows AFL to talk to it while it’s running so it can generate new inputs to discover new code paths. The process of including AFL’s additional code in the binary while compiling the application is called
instrumentation, the term I promised I’ll explain.
Note: The following commands are being run inside the fuzzgoat’s root directory
So, to compile the application with AFL’s compilers, we have to explicitly mention which one to use. Generally, it’s best to stick to
afl-clang-fast but one can also use
afl-g++ depending on the use case. We’ll use
Now, depending on the application, we need to compile the application into a binary. For fuzzgoat, we run the below command but there could be some application which requires us to run
./configure before we use the
make command to build the binary. Let’s compile fuzzgoat to a binary:
Building Test Corpus
Test Corpus is what I’ve already been talking about since the beginning of this blog, they are ‘seed input files’. They are a set of files (could also be a single file) which are used as the initial input to test the binary. It also serves as the starting point for AFL to mutate it to generate new test files as it sees fit to discover new code paths.
Although AFL is smart enough to do a lot of heavy lifting for us, including figuring out what could be good test inputs as we saw in Michal’s blog but one should build a good test corpus simply because it makes the whole fuzzing process faster. By giving AFL good initial test cases, it starts off at, say level ‘X’. Now, AFL could very well still have reached ‘X’ starting from a blank text file but the time it took to reach there could’ve been saved. Hence, always try to build good test cases depending upon the target application.
That being said, I’ll still be using a not-so-good test case, created by piping some random data because:
- This blog is to learn how to work with AFL to fuzz applications
- Building test cases based on the characteristics of an application goes beyond the scope of this blog
- Since, fuzzgoat is intentionally buggy, bad test cases will also yield crashes in a reasonably small period of time
We’ll first make a directory to keep all our test cases. You can name it anything you like. I’ll name it
Now, let’s add a test case by piping some random, garbage value to the directory we made above:
cp /bin/ps afl_in/
Running AFL on Target
One last thing before we start fuzzing, we need to make a directory for AFL to store the files that resulted in a crash or a hang. AFL makes three sub-directories inside this folder -
crashes holds the test cases which made the application crash,
hangs holds the test cases which made the application hang and
queue holds the test cases that AFL is yet to test the application with. I’ll name the directory
afl_out but again, it can be named anything:
Finally, to fuzz the application we use the following command:
afl-fuzz -i afl_in -o afl_out -- ./fuzzgoat @@
Breaking the above command into parts:
-i afl_inspecifies the directory to take the seed test cases from
-o afl_outspecifies the directory where AFL can store all result files for crashes, hangs and queue
--separates the target’s command structure. The left side of the separation is where AFL’s flags are passed and the right side is where the target’s run command is, in this case,
@@defines the position where AFL is supposed to insert the test file in the target application’s command structure
@@is not mandatory. AFL can also pass input to the target through
Running AFL should yield the following interface on your terminal:
The interface and the information are quite self-explanatory but I’d put some emphasis on certain segments that you should definitely keep an eye on:
last new path shows a big duration, it means AFL is unable to find new paths. In this case, make sure that you’re using the instrumented binary and not just a normally compiled binary.
Let the fuzzer run till it has at least 50
All that’s left is looking at the results. Let’s navigate to the directory where AFL has kept all the test cases that resulted in crashes or hangs:
/hangs directories should have files with names resembling (but not the same) as depicted below:
|- id:000003,sig:11,src:000000,op:havoc,rep:32 --- snipped --- |- /hangs
Now, we can take a look in these files to see what exactly AFL mutated the seed input to and then figure out why it made the application crash or hang. Finally, it’s on us what we want to do with the bugs we found with AFL.
In essence, fuzzing is a way to discover undiscovered bugs, and AFL makes it as easy as it can be.
AFL is a very powerful tool while remaining almost effortless to use. It makes the fuzzing process a matter of a few steps while it takes care of everything in the background.
Since this blog was meant to be a quick start guide to using AFL, there’s a lot of customization, that AFL provides, which wasn’t covered in this blog. AFL also provides various mechanisms for optimizing the process of fuzzing the target, like parallel fuzzing, optimizing test cases as a set and individually. I’ll write a follow-up blog on how to optimize fuzzing with AFL and also talk about some built-in tools that AFL comes pre-loaded with.