Tautology Tests

Roy Williams
Aug 11, 2017 · 6 min read

Tests are an incredibly important part to producing quality software quickly, but, as with all things in life, can cause more harm than good when used incorrectly. Consider the following overly simple function and test. In this case, the author wants to insulate their tests from external dependencies, so they rely on mocks.

Looks great! Fully tested, 4 assertions to ensure the code executed as expected. The tests even pass!

$ python3.6 -m pytest test_simple.py
========= test session starts =========
itemstest_simple.py .
======= 1 passed in 0.03 seconds ======

Of course, the problem is this code is wrong. md5 only accepts bytes not str (See this blog post for an explanation of how bytes and strings changed in Python 3). The test case had little value; it really only tested string formatting, granting us a false sense of security. We thought we had written our code correctly and we had test cases to prove it!

Thankfully mypy catches these issues:

$ mypy test_simple.py
test_simple.py:6: error: Argument 1 to “md5” has incompatible type “str”; expected “Union[bytes, bytearray, memoryview]”
test_simple.py:8: error: Argument 1 to “update” of “_Hash” has incompatible type “str”; expected “Union[bytes, bytearray, memoryview]”

OK great, we fix the underlying code to first encode our strings as bytes:

Our code now works, but we still have a problem. Let’s say someone else comes through and simplifies our code even further to just a couple of lines:

This code is functionally identical to the original code. For the same set of inputs it will always return the same output. Even so, our tests fail:

E AssertionError: Expected call: md5(b’hello’)
E Actual call: md5(b’helloworld’)

There’s clearly a problem with this simple test. It’s both been subject to Type I Error (failed when the underlying production code was fine) and to Type II Error (passed when the underlying production code was broken). In an ideal world tests would fail if and only if the underlying code was broken. In an even more perfect world if our tests passed we would have 100% confidence our production code was correct. While neither of these ideals is achievable, they are paragons worth pursuing.

I call these kinds of tests “Tautology Tests”. They assert that the code is correct by ensuring the code executes as written which, of course, assumes the way it is written is correct.

Image for post
Image for post
Comic by xkcd, licensed under CC-BY-NC 2.5, available https://xkcd.com/703/

I argue that Tautology Tests are a net negative for your code for a few reasons:

  1. Tautology Tests give engineers a false sense of security that their code is correct. They might look at the high code coverage and feel good about their project. Others coming into the code base will feel confident pushing changes as long as tests pass even though those tests aren’t really testing anything.

In short, Tautology Tests frequently miss real issues, encourage the bad habit of blindly fixing tests, and cost substantially more to maintain than the value they provide.

Now consider if we rewrite the test to just test the expected output:

Now my test doesn’t care about the internal details of get_key and will break only if get_key returns an incorrect value. I can change the internals of get_key as I see fit without having to update tests (unless I change the public behavior). My test is also succinct and easier to understand.

While this is a contrived example, it’s easy to find examples in real code that — for example — assume the output of an external service matches the implementation’s expectations just to increase code coverage.

How to Find Tautology Tests

  1. Tests that get updated much more frequently than the code they’re testing when they fail. Every time this happens it’s part of the cost we pay for test coverage. If that cost starts to exceed the value derived from the test it’s a strong hint that our test code is too tightly coupled to the implementation. A related problem: a small production code change requires updating a much larger number of tests.

How to fix Tautology Tests

  • Keep I/O separate from logic. I/O is one of the most frequent reasons engineers have to reach for mocks. I/O is super important…without I/O all we can do is spin CPU cycles and heat up our computers…but it should be pushed to the peripheries of your code instead of being interleaved with logic. The Sans-I/O working group in the Python Community has some great documentation on this topic and Cory Benfield covered this well at PyCon 2016 in his talk Building Protocol Libraries The Right Way.
  • Use fixture data. If the dependency being mocked is an external service, consider creating a common set of fakes or using a mock server to provide fixture data. Centralizing the implementation of the fake allows for careful emulation of the real implementation’s behavior and minimizes how much test code has to change if the underlying implementation changes.
Image for post
Image for post

It’s better to leave a line of code untested than to give the illusion that it is well tested.

Keep an eye out for Tautology Tests during code review as well. Ask yourself what this test is actually testing and not just “does it cover some lines of code?”

Remember, Tautology Tests are bad because they are not good.

Further Reading

Thanks to Kent Beck, Simon Stewart, Ben Hamilton, and Josh Cincinnati for feedback!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store