go-fuzz github.com/arolek/ase

This is a quick tutorial on how to fuzz a simple library using Dmitry Vyukov’s go-fuzz tool.

First, find a target. In this case, browsing /r/golang I saw a post announcing a package for decoding Adobe Swatch Exchange files. Parsing data is always a fraught endeavour, so it makes a good target for fuzzing.

Next, download the package.

$ go get github.com/arolek/ase
$ cd `go list -f '{{.Dir}}' github.com/arolek/ase`
$ git reset --hard b1bf7d7a70445821722b29395f07fcd13e940f8c

The third step is only needed if you want to play along at home. My fixes for the crashes have been merged in, so you’ll need to reset the git repository to just before that point.

Looking at the godoc link, we see

func Decode(r io.Reader) (ase ASE, err error)

which looks like a good entry point to fuzz.

In order to fuzz, we need two things

  1. a fuzzing function
  2. sample inputs

Finding sample inputs was pretty easy — the ase package ships with 3 in the samples directory. I also found a few more with Google, but this wasn’t necessary.

Next, write the fuzzing function. The function must have the signature

func Fuzz(data []byte) int

A return value of 0 from Fuzz() means the data wasn’t interesting — the parser detected an error. A return value of 1 means the data was parsed successfully, even though it had been modified. The fuzzer keeps these as more interesting inputs, since they appear to be valid.

Since our target function takes an io.Reader, but Fuzz provides us with a []byte, we can wrap it with bytes.NewReader. Here’s the complete fuzzing file, including the go-fuzz build tags. I put this into a file called ‘fuzz.go’.

// +build gofuzz
package ase
import "bytes"
func Fuzz(data []byte) int {
if _, err := Decode(bytes.NewReader(data)); err != nil {
return 0
}
return 1
}

If you don’t have go-fuzz already installed, do that now.

$ go get github.com/dvyukov/go-fuzz/go-fuzz
$ go get github.com/dvyukov/go-fuzz/go-fuzz-build

Next, build the package with go-fuzz:

$ go-fuzz-build github.com/arolek/ase

While this is building (it might take a while), create a work directory and put the sample files into the corpus. You can put this workdir right inside the github source directory for the package. We’re not going to be committing it.

$ mkdir -p workdir/corpus
$ cp samples/*.ase workdir/corpus

When go-fuzz-build finishes, it will have created a file called ‘ase-fuzz.zip’. This contains all the instrumented binaries go-fuzz is going to use while fuzzing.

Next, start the fuzzing process. We have to pass go-fuzz the path to the zip file it just created, and also the path to the workdir with the corpus.

$ go-fuzz -bin=ase-fuzz.zip -workdir=workdir

At this point, your machine will start to heat up. The fuzzing is mutating the samples and passing the resulting byte slices to our fuzzing function. If we reach new bits of the code with the mutated file, it’s added to the corpus directory. They’ll be named with the sha1 of the contents.

The fuzzer will start to print out log lines

2015/08/17 21:02:01 slaves: 4, corpus: 4 (27s ago), crashers: 0, restarts: 1/50, execs: 131922 (4885/sec), cover: 71, uptime: 27s

This says there are currently 4 slaves running for this particular fuzzer (I have a 4 core laptop), there are 4 items in the corpus, the last one was added 27 seconds ago. The 71 is how many bits are set in a particular data structure. The details aren’t important, but in general you want that number to be going up. It should increase every time a new input file is added to the corpus.

After a while, you’ll probably see something like this:

2015/08/17 21:02:49 slaves: 4, corpus: 6 (11s ago), crashers: 2, restarts: 1/26, execs: 346273 (4615/sec), cover: 304, uptime: 1m15s

The interesting part is ‘crashers: 2’. You’ll also notice we now have 6 items in our corpus (last was discovered 11 seconds ago) and the cover number has increased to 304. We’ve also been running for just over 1 minute.

Because go-fuzz has found some crashers, there are two more directories in the work directory: ‘crashers’ and ‘suppressions’. The suppressions directory contains stack traces of crashes to ignore, so that the exact same crashes aren’t reported multiple times. That’s slightly less interesting than the crashes directory itself:

$ ls workdir/crashers/
0394eddcf7c9deced410b556fa6627568b08ff0b 0394eddcf7c9deced410b556fa6627568b08ff0b.quoted
0394eddcf7c9deced410b556fa6627568b08ff0b.output 919cd42975df835d9a41f76a1ae4dd2d17916ea9.output
919cd42975df835d9a41f76a1ae4dd2d17916ea9 919cd42975df835d9a41f76a1ae4dd2d17916ea9.quoted

Here is information about our two crashes. The crashes are named with the sha1 of the input. The actual data that caused the crash is the file with no extension. The ‘.quoted’ file is the data as a string constant so it can easily be added to a test file. The ‘.output’ file is the output from the panic() or whatever go-fuzz decided the crash was.

In our example, one of the output files contains:


panic: runtime error: slice bounds out of range

goroutine 1 [running]:
github.com/arolek/ase.(*Group).readName(0xc820010420, 0x7f4f35fac1c0, 0xc8200103f0, 0x0, 0x0)
/tmp/go-fuzz-build383666922/src/github.com/arolek/ase/group.go:39 +0x30e
github.com/arolek/ase.(*Group).read(0xc820010420, 0x7f4f35fac1c0, 0xc8200103f0, 0x0, 0x0)
/tmp/go-fuzz-build383666922/src/github.com/arolek/ase/group.go:22 +0x148
github.com/arolek/ase.Decode(0x7f4f35fac1c0, 0xc8200103f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/tmp/go-fuzz-build383666922/src/github.com/arolek/ase/ase.go:74 +0x932
github.com/arolek/ase.Fuzz(0x7f4f35da8000, 0x14, 0x200000, 0x40c357)
/tmp/go-fuzz-build383666922/src/github.com/arolek/ase/fuzz.go:9 +0x13d
github.com/dvyukov/go-fuzz/go-fuzz-dep.Main(0x553498)
/home/dgryski/work/src/cvs/gocode/src/github.com/dvyukov/go-fuzz/go-fuzz-dep/main.go:44 +0x14c
main.main()
/tmp/go-fuzz-build383666922/src/go-fuzz-main/main.go:10 +0x23
exit status 2

When we look at the source code listed at the top line from the stack trace, we see

//	decode our name. we trim off the last byte since it's zero terminated
group.Name = string(utf16.Decode(name[:len(name)-1]))

So, that slice access to name[:len(name)-1] is causing an out of bounds error.

We can import this test case into our test suite with something like the following:

package ase

import (
"strings"
"testing"
)

func TestFuzzCrashers(t *testing.T) {

var crashers = []string{
"ASEF\x00\x01000000\xc0\x010000\x00\x00",
}

for _, f := range crashers {
Decode(strings.NewReader(f))
}
}

Since we haven’t changed anything yet, when we run go test this should crash. Having the crasher in a unit test makes it easier to check, and also that we won’t regress and have this crash again once we’ve fixed it.

In this case, it’s easy to discover that len(name) is 0, and so len(name)-1 is not a valid slice index. Boom. It turns out the other crash is an identical bug in the color name reading code.

A simple patch to check if there is actually a name to read is enough to fix the problem and stop our tests from crashing:

 // Decode a group's name.
func (group *Group) readName(r io.Reader) (err error) {
+ if group.nameLen == 0 {
+ return
+ }
+
// make array for our color name based on block length
name := make([]uint16, group.nameLen)
if err = binary.Read(r, binary.BigEndian, &name); err != nil {

And now, a quick pull request and we’re done.