Fuzz Testing — A Primer

Published in

Work-Bench

8 min readSep 9, 2019

In May of 2016, any website using ImageMagick, a popular image processing library, found themselves vulnerable to Remote Code Execution (RCE). CDNs quickly deployed preventative measures, but they could only do so much. By the end, even giants like Facebook would find themselves with scar tissue from ImageTragick, as the CVE became known.

What is fuzzing?

Fuzzing is an automated software testing technique that generates and runs randomized tests on a code. This process of feeding a system under test randomized data is to identify if code, whether invalid or arbitrary, will trigger anomalous feedback or exploitable bugs. Due to the random nature of fuzz testing, developers are able to find critical bugs in open-source software as well as other bugs that they might have otherwise missed.

Fuzzing, which is also referred to as Monkey testing, relies on the assumption that every program is a host of exploitable vulnerabilities that makes it easier for developers to tackle edge cases without requiring them to think of every single possibility. Modern fuzz testing mechanisms powered by AI offer a more cognizant and proactive testing approach to classical testing techniques like manual code reviews, code debugging and pen testing. In this post, I will discuss the current scope of AI fuzzing in context of application security and potential use cases across the enterprise.

Market

According to Forrester, the application security market is estimated to exceed $7 billion by 2023. Gartner forecasted over $3 billion spent in 2019, having grown over 23% from 2017 estimates.

Since fuzz testing automated and highly scalable, there’s an enormous opportunity for it to impact how software is made.

Rise of the security-conscious developer

Although fuzz testing is not an entirely novel approach to application security testing, the practice of securing the software developer lifecycle (SDLC) is now top of mind for the security-conscious developer. To secure the CI/CD pipeline, a potential vector for attacks, developers embed security directly into the DevOps toolchain. This toolchain is made up of a set of automated tools used to set up the development, testing, configuration and deployment environments for each system of production.

Application fuzzing is usually conducted at the assembly/integration testing phase of the SDLC, right after the code is developed. The fuzz algorithm continuously monitors the pipeline in an automated feedback loop and tests the codes against the established requirements. Functional tests, such as unit tests, integration tests, system tests, and acceptance tests, may be performed in addition to non-functional tests. As a result, fuzzing as a proactive means of security testing allows developers to save significant time and money by finding bugs and vulnerabilities earlier than having to do so at the code deployment stage.

Some benefits of fuzzing

Fewer false positives. In contrast to static analysis that performs abstract code extraction, fuzzing tests actual code execution and reports the bugs that can only be realized in effect. Hence, fuzzing generates fewer false positives than static code analysis.
Black box testing. Since application fuzzing doesn’t require access to source code, black-box fuzzers can uncover zero-day vulnerabilities in commercial software products where access to the source codes are not available. Similarly, fuzz testing is one of the means to test completely closed systems such as SIP/VoIP.
Portability. Basic fuzzers such as protocol fuzzers (e.g HTML fuzzer) can be used to test different web browsers across multiple vendors, making them quite portable.
Complementary. AI fuzzing augments manual testing labor as even the most basic fuzzer can spurn a significant number of input sets to be used in the validation process.

Some disadvantages of fuzzing

Code coverage vs. vulnerable code. Current fuzzer algorithms are more focused on covering as many paths as opposed to finding the paths that are more likely to be vulnerable. These fuzzers also tend to treat seed inputs as if they were all equally vulnerable. This decreases the efficiency of the fuzzer engine because it ends up testing all of the leads instead of prioritizing the ones that require immediate attention.
Testing and remediation is siloed. Human intervention is still required after bugs are initially uncovered through fuzz testing because even though fuzzing indicates the possible presence of bugs it does not confirm whether these bugs are security bugs.
One fuzzer doesn’t fit all. Highly efficient fuzzers are usually tailored to one specific SUT at a time, and won’t likely be as efficient against a different SUT.
Randomness. While the randomization testing has its own share of advantages, each randomized test may produce different results from the other. Without setting any appropriate benchmark for evaluating each fuzzing algorithm, fuzz testing results may be inconclusive and be a waste of time.

How do fuzzers work?

Fuzzing is a form of dynamic testing that performs cost-effective negative testing, i.e, testing that the system doesn’t do things that it is not supposed to do. The fuzzing algorithm probes for risky assumptions within the SUT by generating an input that causes the system to crash. Once a crash has been recorded, some fuzzers will immediately break the fuzzing loop. Other fuzzers keep going on a continuous loop even after the first crash to collect as many crashes as possible.

The output generated during fuzz testing is then collected and reviewed by software developers who then confirm the presence of bugs and debug issues to prevent further crashes. In general, a fuzzer which is good at identifying a great number of unique bugs is deemed the most efficient.

Code coverage

By definition, code coverage is defined as the percentage of code blocks covered by the fuzzing algorithm during testing. Code coverage analysis is used in addition to fuzzing to ensure that all code edges are fully tested for potential flaws. As a result, code coverage analysis is a great way to strengthen fuzzing and greatly improves the overall performance overhead.

Depth

While code coverage refers to the proportion of code surface tested for potential bugs, code depth refers to the number of code gatekeepers that must be crossed before reaching a particular segment of code. The higher the number of gatekeepers the greater the depth.

A highly efficient fuzzer must generate test cases at high coverage and depth.

Types of fuzzers

“Black box” fuzzers. A random testing tool that generates arbitrary and unexpected inputs are black box fuzzers. These do not have access to the program source codes, and are only capable of observing the output result of whether or not the program has crashed. This type of fuzzers usually work the fastest.
“White box” fuzzers. They have access to the application source code and are able to exploit the semantics these source codes and collect the resulting observations. White box fuzzers are usually the most effective because they have an internal view of the program. However, these analyses may be time-consuming.
“Gray box” fuzzers. Unlike White box fuzzers, Gray box fuzzers do not need access to source codes. Yet, they are more refined that black-box fuzzers. Gray box fuzzers that use AFL and libFuzzer leverage instrumentation to map the intermediary path taken by the basic block identifiers during code execution which enables them to glean inside the program and track the increase the increase in code coverage during fuzzing. Gray box fuzzers are highly efficient.

Fuzzing using neural nets

To augment existing fuzzing capabilities, Microsoft developed Neural fuzzing, a technique that uses deep neural networks (DNN) in addition to machine learning. DNNs use inputs generated in previous tests and code coverages in order to train the fuzzer engine to learn and distinguish a vulnerable code from a non-vulnerable one. DNNs are usually inserted in feedback loops of Gray box fuzzers which are then trained to recognize hidden vulnerability patterns. Given the cognitive aptitude of neural fuzzers, neural fuzzing has the potential to be more efficient than AI Fuzzers because it can actually identify and target the vulnerable paths from the non-vulnerable ones, resulting in a lot of time saved!

Fuzz testing in the real world

Today, you’d find it difficult to run software that hasn’t benefited from fuzz testing. Efforts like Google’s OSS-Fuzz (part of ClusterFuzz) take open source software, and automatically test for vulnerabilities at massive scale. Just the AFL component of OSS-Fuzz has found exploits in over 150+ popular projects. Nginx, PHP, and sqlite are just a few of the names in AFL’s bug trophy case.

Fuzzing is currently used skillfully by a number of projects and companies, including:

In summary

Fuzz testing is a way of automatically generating and feeding random inputs into an application. Promising work by Microsoft has shown that combining AI/ML with fuzzing can further increase the efficiency of an already highly automated tool.

Given the growing need for software security, we think fuzzing is a category that holds a lot of promise for bringing security closer to the developer workflow. If you’re working on a startup in or around fuzz testing, get in touch! We’d love to hear from you.

Here is a collection of writings, people to connect with, and videos to watch, if you’re interested in learning more:

“Fuzzing: An Old Testing Technique Comes of Age” by Bruce Byfield (The New Stack)

“ In the past, testing has often been overlooked and automating it is an obvious way to increase the level of testing without increasing costs or enlisting new developers. Noticeably, too, public concern about security motivates corporations and open source projects alike to improve their testing quickly. In open source, in particular, the maturity of applications often means that manual testing is more time-consuming and inefficient than ever before. Under these conditions, today fuzzing is making more and more sense.”

“How to Spot Good Fuzzing Research” by Trent Brunson (Trail of Bits)

“Again for lack of a common benchmark suite, fuzzer evaluators should look for research that used one of the above datasets (or, better yet, one native and one synthetic). Doing so can encourage researchers to ‘fuzz to the test,’ which doesn’t do anyone any good. Nevertheless, these datasets provide some basis for comparison. Related: As we put more effort into fuzzers, we should invest in refreshing datasets for their evaluation, too.”

“Fuzz Rising” by Justin Cormack (Cloud Atomic Laboratory)

“Fuzzing as a service is available too. Operationally fuzzing is not something you want to run in your CI pipeline, as it is not a test that finishes, it is something that you should run continuously 24⁄7 on the latest version of your code to find issues, as itstill takes a long time to find issues, and is randomized. Services include Fuzzbuzz a fairly new commercial service (with a free tier) who are very friendly, Microsoft Security Risk Detection and Google’s OSS-Fuzz for open source projects.”

People to follow

Ben Nagy