Three Golden Testing Patterns in Go

Background

At Yik Yak we built an entirely new stack in Go quickly. It was here that I wrote my first line of Go and was introduced to several testing patterns new to me, including table-drive tests and generated goldens which I really like. But with this pattern, when updating the functionality of my code, I found some drawbacks. I’ll discuss a few testing approaches and offer some proposed guidelines about when and how to use each pattern.

The Testing Problem

In testing code with simple and stable behavior it’s easy to write a straightforward test. I’ll use a very simple bit of code as an example, but pretend that this is more complex (something like a JSON response). Consider this function that returns a greeting given a name:

func greet(name string) string {
if len(name) <= 0 {
return fmt.Sprintf("Hi!")
}
        return fmt.Sprintf("Hey %s!", name)
}

In this example, the correctness of the function is a little squishy. It’s perfectly fine for a greeting for Joe to be “Hey Joe!” or “Joe! How’s it going?” or “Long time no see, Joe!”. Similar code that returns a complex struct could have a lot that can change without affecting the correctness of the response.

A Few Approaches to Testing our Greeter

Static Test

The most simple approach is a static test where we hard-code responses for given inputs. I’ll use a table-driven test pattern in each example so if you’re not familiar spend a moment to understand what’s going on, it’s a superb pattern.

func TestGreetingStatic(t *testing.T) {
var tests = []struct {
name string
input string
expected string
}{
{"alice", "Alice", "Hey Alice!"},
{"bob", "Bob", "Hey Bob!"},
{"empty", "", "Hi!"},
}
        for _, test := range tests {
actual := greet(test.input)
require.Equal(t, test.expected, actual)
}
}

What works about this Static Test approach:

  • It will identify breakages of the implementation (behavior changes).
  • It is simple to understand.

What sucks:

  • If your output is complex, creating and updating it to reflect the current system behavior is high effort, requiring the developer to modify each test and construct the correct output.

Property Test

With this approach the test will look at a property of the response rather than an equivalence check. This approach is nice (necessary even) if you want to test a method that has inherent variance in its output (and you can’t control the entropy). In this case we check that the greeting include the name of the person. We might also consider a test successful if it used either substring of a space-separated string (e.g. “Hi Mr Hope” might be an acceptable greeting for “Bob Hope”); in this way a property test can remain flexible except that it can lead to complex logic in itself (a bad thing for tests).

func TestGreetingPropertyCheck(t *testing.T) {
var tests = []struct {
name string
input string
expectedToContain string
}{
{"alice", "Alice", "Alice"},
{"bob", "Bob", "Bob"},
{"empty", "", ""},
}
for _, test := range tests {
actual := greet(test.input)
matchExpected := strings.Contains(actual, test.expectedToContain)
                require.True(t, matchExpected, fmt.Sprintf("Expected to find substring '%s' in response, got '%s'", matchExpected, actual))
}
}

What works about this Property Test approach:

  • Robust to change.
  • Checks for the important components of the response.

What sucks:

  • The test code itself must be sophisticated.

Generated Golden Test

A “Golden” is an example of the correct answer to which all test outputs are compared. The “Static Test” approach above uses a golden in the form of a string but often the golden is kept in a file. This approach uses that pattern but includes code to automatically generate the golden output.

In a Generated Golden Test the developer need not manually create output or define correctness with properties. With this pattern the developer must verify correct operation, captures the output and then that output becomes the new “right answer”.

This approach can save a lot of time by not having to manually craft a correct response to test against, and it is very easy to update when behavior changes. But these too are the downsides: It becomes so easy to “fix” the test to reflect the current behavior that a developer might overlook a bad behavior change.

var update = flag.Bool("update", false, "update golden files")
func TestGreetingGenGolden(t *testing.T) {
var tests = []struct {
name string
input string
}{
{"alice", "Alice"},
{"bob", "Bob"},
{"empty", ""},
}
        for _, test := range tests {
actual := greet(test.input)
                actualJSON, err := json.MarshalIndent(actual, "", "  ")
require.Nil(t, err)
                golden := filepath.Join("test-fixtures", test.name+".golden")
if *update {
fmt.Println("Updating Test " + test.name)
err := ioutil.WriteFile(golden, actualJSON, 0644)
require.Nil(t, err, "Failed to write. Check that path to "+golden+" exists.")
}
                expectedJSON, _ := ioutil.ReadFile(golden)
require.Equal(t, string(expectedJSON), string(actualJSON), test.name+" results didn't match expectation.")
}
}

To create (or update) the goldens, the developer need only run:

go test -update

What works about this Generated Goldens approach:

  • Reduces effort to write tests, especially in cases of complex output.
  • Makes it very easy to discover unintended behavior changes.
  • Trivial to update tests when behavior changes.

What sucks:

  • There is a risk in developers missing a legitimate defect because they too quickly updated the goldens instead of carefully understanding the change. Code reviewers looking at the golden changes can mitigate this but if they’re very large there is little utility.

Note that this approach also works well for ranged values (although keeping all the outputs for a range in one file may be wise).

A Framework for Choosing the Best Approach

Each of these approaches has a place, but they’re more effective when used together.

For functionality that is not expected to change, static tests are ideal. Their simplicity is the key and not having to refer to a separate file to find the correct output in case of a test failure is useful. But even with a static approach, throwing a large number of inputs via a generated golden (say on a range of values) can improve edge case coverage or otherwise help find obscure behavior changes that could be introduced by a developer that doesn’t recognize they’re introducing new behavior.

For more sophisticated functionality, a blended approach of a Property Test and a Generated Golden approach is very nice. A small set of simple Property Tests keep the developer honest (you must actually think about the necessary behavior in order to update a small set of tests), and are generally robust to design intended behavior changes. And a large number of Generated Golden tests can easily identify behavior changes whether intended or not, and they are trivial to update.

Summary

When creating tests and selecting a test methodology it’s wise to consider not just the current behavior but how that code may evolve and how future developers will interact with the established tests. Consider a layered methodology approach to get the best results, balancing developer effort and broad coverage of inputs to arrive at an optimal process.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.