Writing Tests First

--

There’s a best way to automate your tests: write them before you write the program. If you’ve heard “test-driven development,” that’s what it means, but if you’ve never seen such a thing before, it’s an unintuitive idea. How should you start?

A basic example

Here, we’ll develop a function that implements the factorial function in a test-driven style. We start with a stub: an empty function. This does nothing, but that’s the point: in effective automated testing, we write the tests first, only after we have a test do we write the code.

`def factorial(n):    ...`

Traditionally, automated tests are a series of examples in the form “given input X, I expect output Y.”

For our factorial function, we’ll have examples like “what is the factorial of 5?” Answer: 120. We don’t start at five though, we start with something much more basic: “what is the factorial of 0?” We’ll write our examples using the simplest method of automated testing: the assert statement:

`def factorial(n):    ...assert factorial(0) == 1`

After adding our first test, we run the code. It’s important that we still haven’t written any code in our factorial function, because we want our test to fail. It does:

`\$ python fac.pyTraceback (most recent call last):  File "fac2.py", line 15, in <module>    assert factorial(0) == 1AssertionError\$`

Red-green

When we write test-first, we always write a failing test first. Once we have a failing test, we can write the simplest possible code to make the test pass:

`def factorial(n):    return 1assert factorial(0) == 1`

More sophisticated test tools report how many tests run and how many fail. When using plain assert statements, however, we don’t get those details. Instead, we assume no news is good news: if we run our program and see no errors, our test passed:

`\$ python fac3.py  # no errors, all's well!\$`

But all is not well yet. We know that this isn’t correct: our factorial function returns “1” no matter what we give it. Obviously, there should be an “if” condition here, maybe “if n == 0: return 1”, but we don’t write that because we’re writing test-first and we don’t have a test for the if condition. We could just as easily write “if True: return 1” or “if ‘bananas’: return 1” and all our tests would still pass.

We repeat:

1. Write a failing test
2. Write the simplest code to make that test pass

This is called the “red-green” cycle because “failing” is “red” and “passing” is “green.” Let’s write a test:

`def factorial(n):    return 1assert factorial(0) == 1assert factorial(2) == 2`

It fails, as we expect:

`\$ python fac.pyTraceback (most recent call last):  File "fac4.py", line 17, in <module>    assert factorial(2) == 2AssertionError\$`

We “fix” it, again in the simplest way we can.

`def factorial(n):    if n == 0: return 1    if n == 2: return 2assert factorial(0) == 1assert factorial(2) == 2`

And all our tests pass:

`\$ python fac.py\$`

Back to writing another failing test:

`def factorial(n):    if n == 0: return 1    if n == 2: return 2assert factorial(0) == 1assert factorial(2) == 2assert factorial(5) == 120`

And verifying that it fails:

`\$ python fac.pyTraceback (most recent call last):  File "fac6.py", line 14, in <module>    assert factorial(5) == 120AssertionError\$`

And now we make it pass again. We could write “if n == 5: return 120”, but is that really the “simplest” way to make our latest test pass? It’s a matter of judgment but we’ll say that the simplest thing really is to write the recursive branch.

`def factorial(n):    if n == 0: return 1    return n * factorial(n - 1)assert factorial(0) == 1assert factorial(2) == 2assert factorial(5) == 120`

And it passes. We’ve now built an actual implementation of factorial(n) and by methodically testing at each step, we can be confident we have tests that exercise all the code.

`\$ python fac.py\$`

Refactor

Another name for the “red-green cycle” is the “red-green-refactor.” The “red-green” part is about writing a correct function; the “refactor” part means that once we’ve written a correct function, we can edit it to make it easier to read.

There are many different ways to write this same function. Whether one is better than another may be obvious, or it may depend on your personal taste, but having written this in a test-driven style, we can be confident that we have a suite of tests that covers our algorithm, so when we edit the code for clarity, we can check easily enough that it’s still correct:

`def factorial(n):    return n * factorial(n-1) if n else 1assert factorial(0) == 1assert factorial(2) == 2assert factorial(5) == 120`

It still passes:

`\$ python fac.py\$`

No test suite is perfect: it is possible to introduce breakage that your test suite doesn’t detect, but this systematic way of building a test suite is the best known way to approach the theoretical ideal. Here, we’ve just rearranged the code to implement the same algorithm a slightly different way; if we were to change the algorithm during refactoring — to implement an iterative algorithm, for example — we might need to write new tests to cover the new code.

This is a good start, but we’ve only done the easy part so far. The hard part of most programs is error handling, so we’ll look at negative testing next.