QuickCheck or Fuzzing? Which one to use?
TL;DR: QuickCheck for fast testing of functions with random values as part of a test suite. Fuzzers (model or mutation based) take longer time to run and are best used for world facing interfaces of the software like file or protocol parsers. QuickCheck and fuzzers both input random and usually invalid data. Which one to use? Consider using both.
This text is based on something I wrote to /r/rust.
QuickCheck for doing checks quickly
QuickCheck (I have used the Haskell one, but many languages have their own thing) uses programming language’s type system to define it’s input space. For every integer, try integers. For boolean, try booleans. For strings, try strings. And it doesn’t stop to primitive types. Lists, arrays, structs, vectors, product types, sum types, etc. get their own love from QuickCheck.
QuickCheck can be applied to many functions inside a program. It uses invariants (logical properties that should always apply) for giving a verdict. For example consider
reverse(reverse(list)) == list which we hope is always true. In this case for lists, QuickCheck tries the empty list, list with one element, list with few elements and maybe list with lots of elements. How many unit tests would you have written for this? QuickCheck got you covered already.
QuickCheck can also be used for pair of more complex functions like one for parsing and one for producing PNG file. Someone needs to write these properties and the work should be considered to be equal to writing unit tests.
QuickCheck is best used as part of a project’s test suite and should run pretty quickly. Think seconds or minutes, not hours or days. Well suited for testing every single commit in continuous integration.
Fuzzing is for places which require monkeys hitting keyboard
Fuzzers are usually best to be used on the world facing interfaces of the software. You have library for parsing a file format? Use fuzzer on the parser function. Network protocol? Use fuzzer to test the state machine of the protocol. What is common with file parsers and network protocol parsers? Their input is byte strings.
Since input can be considered just bunch of bytes, fuzzer needs a way to reduce the input space for testing. One byte can have 256 different values, two bytes have 65536 and 10 bytes is already over 10²⁴ values. It’s impossible to test all possible values so there needs to be an intelligent way to reduce it.
Fuzzers can be divided into two categories: Model based and mutation based. Both try to solve the input space problem.
Model based fuzzers
Model based fuzzer has knowledge about the format. For example PNG file fuzzer knows the structure and the semantics of a PNG file and can thus produce valid and invalid PNGs out of thin air.
Model based fuzzer requires someone to build the model, but then the model can contain useful things like calculating lengths of strings or more complex things like whole network protocol state machines. It’s also possible to apply the layers of cryptography, do checksum calculations, etc. so protocols like TLS or IPsec benefit from it greatly (remember Heartbleed?).
Model based fuzzer can also be applied to “black box”. No need for software’s source code or compiling it with some flags or libraries. Just web server with port open and fire up TLS fuzzer and start testing.
Mutation based fuzzers
Mutation based fuzzer takes input samples (for example valid PNG files) and starts to apply mutations to the data. Flip bit here and there, change byte from one value to another. Drop bytes, duplicate bytes, etc. It’s also possible for the fuzzer to try learn the model from the input samples. One example of this is a fuzzer called Radamsa (Github).
File has a checksum? No change for mutation based fuzzer to get checksum correct. It doesn’t know it’s a checksum. However, what’s cool about mutation based fuzzers is the low cost (as in time and work) of getting the testing up and running. Have some input files, build a function to parse the files and fire up a fuzzer. Shouldn’t take long and you can start collecting the sweet results.
Mutation based fuzzers (like AFL and LLVM’s libfuzzer) have instrumentation compiled inside the tested software so the fuzzer can see how the program behaves internally. If a mutation found a new execution path, it’s worth testing it more. If mutation didn’t produce any new paths, maybe it’s not worth the time. Run a mutation based fuzzer for long time and it might produce valid PNG file from testing PNG parser.
A simple example of using libfuzzer can be found from OUSPG’s libfuzzerification project.
Instrumentation in Fuzzing
Now, how to know if and when the fuzzer has found an issue? One method is to look at the tested target from the outside and observe it. Did it crash? Did it consume too much CPU or memory? Mostly fuzzers look for crashes, because those are bad and easy to observe. Pick low hanging fruits first. But using tools like LLVM’s Sanitizers (AddressSanitizer, MemorySanitizer, LeakSanitizer to link a few of them) or Valgrind’s different tools can be used to find memory leaks, undefined behaviour, etc.
Because the input space for fuzzing is so huge, fuzzers are best left to run as long as possible. Hours or even days. After days of work, leave your machine running fuzz testing to your project overnight. Clusters of computers doing fuzzing 24/7 are used in some places (like Google’s oss-fuzz).
Fuzzing State of the Art
We collected Fuzzing State of the Art in one brainstorming session last summer (OUSPG Open 2016–06–28). It’s mostly lists of terms, but it can be helpful together with Googling and other resources. You can find it from OUSPG’s Fuzzing SOTA.
If you read this far, scroll back on top and read the TL;DR again. :)