A passing test is not always enough

Published in

Ordergroove Engineering

5 min readJun 1, 2023

Let’s say you’re writing some new code for a feature. You finish the code and it seems to work when you run the service directly, so it’s time for the final step — writing the tests. You bang out an exhaustive test suite, they all pass, and you’re done — right?

Not quite. All you’ve proven is that the tests pass. But is that enough?

For instance, the following test passes, even though it doesn’t assert anything useful!

test('reticulate splines', () => {
  const result = reticulateSplines();
  expect(true).toBe(true);
});

This is a very contrived example, but it’s still useful to illustrate that just because a test exists doesn’t mean it actually tests the code you wrote. Without knowing that your test can fail, you don’t know that your tests are protecting you from anything at all.

Best case, the code behind your never-failing test is doing what it’s supposed to do. But worst case, the code you wrote is secretly broken, and having a passing test is giving you false confidence. A test that never fails is worthless.

Wait, doesn’t TDD solve this?

I’m sure some of you are yelling by this point — this is why I use test-driven development! And sure, that’s one way to solve this problem. If you have a failing test before you write any code, then you know the test will fail if something breaks that code in the future.

But here’s the problem: I don’t always want to do TDD. TDD is one way to ensure that your tests have the potential to catch regressions in your code, but it isn’t the only way. Even if you’re not using TDD, you should make sure that your test is protecting what you think it is.

Making the test fail

So how do you make the test fail? There are many ways to go about this, but what I normally do is either revert the new behavior I just added or tweak it in a way so that I would expect the test to fail.

To give a simplified example, if I was asserting on the behavior of an add function that adds two numbers together, I could make the test fail by returning a hard-coded value or swapping out the addition for a subtraction.

function add(a, b) {
  return a + b;
}

// failing version 1: hard code incorrect value
function add(a, b) {
  return -30;
}

// failing version 2: bad implementation
function add(a, b) {
  return a — b;
}

You might be tempted to simply invert the assertion. For instance, if you had the following test, you might make it fail by expecting false instead of true.

test('add works', () => {
  const result = add(2, 4);
  expect(result === 6).toBe(true);
}

And sure, that will work for many cases. However, it doesn’t actually verify that breaking the production code will also break the test. I have had multiple situations where my test was calling the wrong function entirely. In this case, inverting the test’s assertion would have caused it to fail, but breaking the production code would not have broken the test. It was only by changing the code under test that I was able to ensure that breaking the implementation caused the test to fail.

This is another reason why TDD advocates for writing the failing test first — if you write a failing test, and then write code to make that test pass, you have the certainty that the code you wrote is covered by the test — but as mentioned, your mileage with that technique may vary.

A real-world example

We had front-end code executing some conditional logic and asserting that the DOM was updated properly. In this case, we were rendering a custom element using Lit. Let’s say we had the following element (simplified for the purposes of this illustration):

import { LitElement, html } from 'lit-element';

class PrepaidOptions extends LitElement {
  constructor() {
    super();
    this.options = [];
  }
  render() {
    const sortedOptions = sort(this.options);
    return html`
      <og-select .options=${sortedOptions}></og-select>
    `;
  }
}

customElements.define('og-prepaid-options', PrepaidOptions);

We wanted to write some tests that asserted that the options were being sorted properly. The first test tested the case where the items were already sorted:

test('displays already sorted options', () => {
  renderElement([3, 6, 9]);
  const element = document.querySelector('og-prepaid-options');
  const select = element.shadowRoot.querySelector('og-select');
  expect(select.options).toBe([3, 6, 9]);
});

Then the subsequent test tested the case where the items were not sorted:

test('sorts shuffled items', () => {
  renderElement([6, 3, 9]);
  const element = document.querySelector('og-prepaid-options');
  const select = element.shadowRoot.querySelector('og-select');
  expect(select.options).toBe([3, 6, 9]);
});

But there was a catch: this test passed, even when the sorting logic was removed. Our query logic assumed there would only be one options element, but there were actually two — the first test wasn’t being cleaned up properly, and our second test was retrieving the element from the first test. Because that first test was on an already sorted array, both tests continued to pass, regardless of whether the sorting happened or not.

We fixed this by creating a dedicated container to render the elements into and removing that container once the test completed.

beforeEach(() => {
  const testContainer = document.createElement('div');
  testContainer.id = TEST_CONTAINER_ID;
  document.body.appendChild(testContainer);
});

afterEach(() => {
  const testContainer = document.getElementById(TEST_CONTAINER_ID);
  testContainer.remove();
});

There’s alternative ways to fix this issue as well, such as using a testing framework that handles this cleanup for you. Still, the fact remains that not seeing a test fail can mask all sorts of bugs.

And here are some other historical examples:

Older versions of Python’s Mock would allow calling any method on them by default. This means you could misspell “assert” or call a method like assert_called_once and think you’re checking something, but the method would just quietly succeed. This was such a common issue that there were multiple blog posts on the subject. In more recent Python versions, the mock will throw on non-existent assert methods.
Jest has a .rejects helper that can test whether a Promise rejects or not. However, you need to remember to await it — in previous Node versions, forgetting to await it would log a UnhandledPromiseRejectionWarning warning to the console but otherwise let the test succeed. Since Node 16, the unhandled rejection will crash the test process instead.

Who tests the tests?

The problem with everything I discuss in this post is that it only solves for one point in the test’s lifecycle — the moment of creation. There’s not an easy way to know the test will stay in that “can-be-failed” state. And that’s because detecting whether or not a test can fail is a hard problem — the only way to know is to pull down the code, tweak it, and run it yourself. This means the “fail-ness” of a test can’t be easily verified during code review. Some of the issues I mention above can be caught via lint rules, but you can’t catch everything.

This is especially on my mind with the rise of AI-powered coding tools that are able to write the unit tests for you. If you trust your entire test suite to AI, can you also trust that those tests will fail when they should? Or are you going to need to pull them apart yourself?

For now, the point where we have the most control over tests is when we write them, and the best we can do is at least make sure that they fail then.

A passing test is not always enough

Wait, doesn’t TDD solve this?

Making the test fail

A real-world example

Who tests the tests?

Written by Geoff Rich