Avoiding False Negatives: When Your Tests Pass But Production Is Broken

Published in

Broadlume Product Development

6 min readJun 21, 2019

At AdHawk, we rely on suites of feature specs, request specs, and unit tests to give us confidence that our applications function properly. We run these tests in Circle CI before merging pull requests and locally while developing features. A passing test suite generally signifies that our production environment is stable. However, that is not always the case. Sometimes bugs make it into production. When this happens, there is almost always a lesson to learn about testing.

A test suite ill-equipped to catch bugs can yield false negatives. In the medical field, a false negative is a test result that incorrectly indicates the absence of a condition. This blog post explores some common reasons why your test suite may return false negatives and how to avoid them.

1. You Don’t Have Enough Test Coverage

The most common reason your tests pass even though a bug is present is because the test suite does not cover the functionality that is currently broken. For example, if your app’s homepage returns a white screen of death but your test suite does not cover that page, it cannot indicate a problem with the homepage.

The first step in addressing this issue is to measure your test coverage. This can be done by integrating with CodeClimate or using a library like simplecov. These tools will highlight lines of code that are not executed when your test suite runs. While a high coverage percentage does not guarantee the absence of bugs, a low coverage percentage indicates that your app is open to uncaught bugs.

Mutation testing is another valuable testing practice. This type of testing introduces changes to your codebase that should cause your tests to fail. Mutation testing can be done with tools like mutant. If your tests still pass, you may be lacking semantic coverage over certain execution paths.

An additional good practice in test-driven development is to add coverage at various levels. The first level includes a feature or request spec. The next level involves unit tests for all objects created or modified when building the feature. If you’re using a frontend framework like React or Vue.js, the test suite may also extend to cover frontend components. Testing each level separately makes it easier to identify and debug problems before they make it into production.

When you encounter a bug, try writing a failing test that exposes it. Once you make that test pass, you can be more confident that the bug has been squashed.

2. You Aren’t Testing What You Think You’re Testing

Even if you get to 100% test coverage, your app can still experience bugs. One reason for this may be that your tests aren’t actually testing what you think they are testing. Here’s an example:

The above expectation tests that all items in the result array are published. This test will pass even if the find_published_stores method returns an empty array. Let’s rewrite the test to prove that the method actually returns published stores.

It helps to double-check that the description of the test corresponds to the expectation. It’s easy to copy and paste an example, change the text, run the tests, and forget to actually modify the example code.

3. You Aren’t Testing Failures

Code can fail in unexpected ways. If you aren’t proactive about testing failures, you may get unexpected results in production.

For example, if you’re only testing the success case and you forget to wrap your queries in a transaction, your production database could get littered with orphaned records created before an error occurs. It’s a good idea to test that these records are not created.

If you’re testing a form and you only test the happy path, you might not realize that you are not displaying validation errors to the visitor. Try testing that the error message gets displayed.

Think about how annoyed you get when you receive obscure error messages when using a web application. Do your users a favor and make sure your app handles failures in an acceptable way.

4. Your Test Environment Does Not Reflect Production

It is difficult to test code that functions differently across environments. Our FlooringStores team encountered this problem when excluding products that lacked enhanced manufacturer data in production only. The benefit was that we could temporarily filter out non-UI-friendly products in production while continuing to develop and test without a ton of overhead.

Everything was going well until we started refactoring the code. The specs were passing. The development environment looked good. It was time to deploy. Within a few hours someone noticed a problem: most of our product images were missing. After some high-stress debugging, we realized that the code that conditionally excluded certain products had been removed. Our product images weren’t missing. We were displaying products that were previously excluded because they had no images!

We were able to put a band-aid on it quickly, but the long-term solution was to make our test data factories create UI-friendly products so we could remove the conditional and make our test suite align with production.

If your app functions differently across environments, your tests likely do not cover all of the production functionality.

5. Your Test Objects Don’t Reflect Real Objects

The Ruby community has built many tools that make automated testing easy. While these tools can be very powerful and help you write fast isolated tests, they can also lead to discrepancies between the test and production environments.

RSpec’s mocking library provides a double method for creating objects that return canned responses. These doubles can be used in place of dependencies to both speed up and isolate unit tests. Although convenient, this can lead to creating test objects with different interfaces than their corresponding production objects.

If we use instance_double and class_double instead, they will verify that the methods we are stubbing exist on the actual objects and classes and raise exceptions if we stub methods that do not exist.

VCR is a Ruby library for recording and replaying web requests. It reduces the number of requests to external web services and brings speed and reliability to your test suite. The downside is that changes in the remote web service can go unnoticed by your test suite even though they no longer work in production. At AdHawk, we avoid this by excluding VCR cassettes from our repositories and expiring them after seven days. This keeps our tests up to date and makes API upgrades much simpler.

Take advantage of tools that make testing easier, but be aware that the more your test code differs from production, the less confident you can be about the value of a passing test.

6. Your Code Does Too Much

All of the above examples address different ways to approach testing that make uncaught bugs less likely. However, this begs the question that maybe the problem is not the test suite — maybe it’s the code!

Without getting too academic, the “S” from the SOLID object-oriented design principles approach can be very helpful in writing code that is easier to test. It stands for the Single Responsibility Principle, or SRP for short. It states that a class should have one reason to change but can be simplified to mean, “Do one thing.” The less responsibilities a class or method has, the easier it is to test.

A service object might validate input, perform some business logic, and handle exceptions. The corresponding unit test should test a range of valid and invalid inputs, test the business logic, and also test the handling of exceptions. With that many examples, it will be easy to miss something.

By extracting the responsibility of input validation to a separate class, the spec for the service object can focus on testing the success and failure of the business logic. Then, a separate spec for the validator can provide a comprehensive test of valid and invalid inputs.

Identify the entities in your codebase that have too many responsibilities. Try extracting them into smaller classes and methods. You will find that they are much easier to understand, test, and maintain.

That covers it (pun intended)! I hope these suggestions nudge you to re-examine your codebase and help keep you off your team’s #incident-response Slack channel when you’re on the treadmill.