Property-based testing in Kotlin — Part 2

David Rawson
Trade Me Blog
Published in
4 min readJan 6, 2020

Promise breakers

As we talked about in part 1, tests are supposed to help us. But have you ever looked down from the high ground on Mustafar at the burning corpse of your tests and yelled in disappointment “You were the chosen one!”?

Obi-wan is disappointed with his tests T_T

There are at least three ways our test let us down:

1. Good tests are easy to understand

It’s hard enough to read production code without reading complex tests. Furthermore, if you follow Gojko Adzic’s advice in Bridging the Communication Gap, the tests are a living specification. The tests live in the code and communicate the spec to the team. That’s pretty difficult if the tests are incomprehensible.

2. Good tests are easy to maintain

A small refactor in a class shouldn’t mean we have to add parameters or extra stubbing to a slew of tests. Fragile tests that are not resilient in the face of refactors are a heavy burden.

3. Good tests assure us our code works properly

Tests should prevent us from introducing bugs. Yet despite all our effort, bugs seem to slip through.

If we know what makes for good tests, then why is testing quality so difficult to achieve? Part of this lies in the very foundation of our testing practice. To understand where we go wrong, let’s use an ancient analogy.

The shape of an elephant

In a story from Indian philosophy, a king summons a group of blind men and presents them with an unfamiliar animal — an elephant. They each get to touch one single part of the elephant. The king asks them: “Do you perceive this elephant? What is it like?”. One version of the story gives the following answer:

“…the men who were presented with the head answered, ‘Sire, an elephant is like a pot.’ And the men who had observed the ear replied, ‘An elephant is like a winnowing basket.’ Those who had been presented with a tusk said it was a ploughshare. Those who knew only the trunk said it was a plough; others said the body was a granary; the foot, a pillar; the back, a mortar; the tail, a pestle, the tuft of the tail, a brush.

Conclusion: it’s possible to perceive individual parts, but not the nature of the whole. Our traditional tests are very similar.

The shape of a class

In our promise-breaking tests, we set up one single scenario that we take to be representative of an edge case or an equivalence class.

Even though each test only describes one sole part of the elephant, we imagine the set of tests in the test suites captures the essence of the elephant itself.

We will call these traditional tests “value-based tests” since they make assertions about the behavior of the system under test for a single value.

In this vein, David Saff and Marat Boshernitsan say:

Traditional unit tests… compare a few concrete example executions against the developer’s definition of correct behavior.

But pot + winnowing basket + ploughshare … + brush != elephant. In other words,

A developer knows more about how a program should behave than can be expressed through concrete examples

A baby elephant

Enough of analogies! Assume a method myMax to calculate the max of a collection of Int:

We want to test it so we write a suite of value-based tests:

If these tests have really done their job and captured the essence, then changing the nature of myMax by altering the implementation should cause the tests to fail:

A more “optimized” implementation ;-)

Surprised Pikachu face, the tests pass:

Tests pass in the IDE
Our tests still pass even though our implementation is wrong T_T

Properties

Instead of a series of example values, like so many dots on a piece of paper, what if there was a way to describe a class in the same kind of broad strokes we use to write it?

A dot-to-dot drawing of an elephant
Describing the elephant using dots

Let’s call these broad strokes “properties” or “theories”. Here’s an example using KotlinTest:

What happens if we run this test against our “optimized” implementation? Remember, this implementation just returns the second value in the list, if there is one.

Test results in the IDE showing a log where the test runner has falsified the property and shrunk the counterexample
Shrinky, shrinky, shrink

Unlike value-based testing, we don’t care about individual inputs. We take the burden of finding inputs away from the programmer and offload it to the test runner.

The test runner throws lists of Int at the property “no elements greater than myMax” and attempts to disprove it. If it finds a counterexample, it runs a shrinker to find a simpler counterexample, in this case the list [1030531734, 774835828]

Intrigued? Stay tuned for part 3 where we clean up these tests and see if they really do bring balance to the force.

Part three follows.

--

--