The Broken Code with 100% Coverage

5 min readJan 15, 2017

Every once in a while, you see the below statement on the Internets (exact wording may vary):

“We can fearlessly modify our code in any way we like, because we have tests”

Every time I hear that statement, something at the back of my brain tells me that this cannot be as easy as it sounds. Until now, I hadn’t been able to put my finger on what caused that feeling. Now, I have some insights, and using a concrete example, I will try to demonstrate why that statement does not necessarily hold.

Example E-Commerce System

Suppose we have an e-commerce website. Say there are 2 major use cases in our system:

Receiving orders from customers
Reporting the total sales given a time interval

Every time a user makes a purchase, we save orders in a (simplified) table like this:

And we have a reporting page which is used to monitor monthly sales. The reporting logic is simple:

Select the rows from the database that correspond to the given date interval
Add up all the amounts and display the sum on the screen

Here are the tests that we need to write in order to test this system thoroughly:

Test that whenever a purchase is made, a new row is inserted in the database. Verify that the SKU and AMOUNT are correct.
Test that no row is inserted if the payment fails
Test that the reporting code returns the sum of the rows that are within the given interval
Test that the reporting code returns zero if no rows are found in the given interval

We can add more tests (e.g. verify the behavior when reporting interval start is greater than interval end, check for null values, etc.) but they are not important for our purposes.

At this point, we have 100% code coverage and we are confident that our system is working as expected.

Adding the Refund Feature

Product management informs us that we need to support returned products. So we need to be able to mark some of our orders as refunded in the system. We decide to add a status column into the orders table, as shown below:

We update the code that saves the orders, add the refund logic and update the tests as follows.

The updated tests (changes are in bold):

Test that whenever a purchase is made, a new row is inserted in the database with status SALE. Verify that the SKU and AMOUNT are correct.
Test that no row is inserted if the payment fails
Test that the reporting code returns the sum of the rows that are within the given interval
Test that the reporting code returns zero if no rows are found in the given interval
NEW TEST: Test that whenever an order is refunded, the order’s status is updated to REFUNDED

aand Bingo !

At this point, we have 100% code coverage, all tests are green, but our system is broken in a very bad way.

We forgot to update the reporting code, and the reporting code is still summing up all the records it finds, including REFUNDED records ! Therefore, it’s displaying wrong sales numbers, which can be catastrophic for the business.

We concentrated on implementing the refund logic, and in the meantime we completely forgot about the reporting use case. The reporting tests are still passing because the added STATUS column does not affect them.

How Did This Happen?

Both use cases are correct according to their current specification (in the BDD sense, expressed by the tests that specify the system’s behavior). That’s why our tests are passing despite the situation.

However, the current specification of our system is incomplete. The reporting tests are not wrong per se, but they are not specific enough. That is, the specification needs to be more specific :)

To fix the situation, we could change the test number 3 above like this:

Test that the reporting code returns the sum of the rows having status SALE that are within the given interval

When we change the test code and data with new rows that include REFUNDED rows, the test will start failing. And fixing the test would restore the system into working state again.

Analysis

So, a change in one part of the code causes changes in the specification of other parts of the code. However, detecting and correcting this requires human intervention.

Simply having tests does not mean that you can carelessly change your code and expect your tests to save you. Also, having 100% test coverage does not guarantee that your system is bug free.

Now, it seems all too easy in such a simplified example, and you may think I’m a fool for forgetting to update the reporting tests. However, real life systems consist of tens or hundreds of such use cases that are interacting with each other in intricate ways, and it’s not very hard to break one place while fixing somewhere else.

Note that throwing panaceas (TDD, functional programming, strong typing, CQRS/ES, you name it) at this problem does not seem to help. Or, saving refunds in a different table would not help by itself. Interestingly, changing the amounts of refunded items to negative values (arguably, one of the worst design decisions you could make) would not break the reporting functionality :)

What Can We Do About It?

An obvious solution to this problem does not seem to exist, at least to me. Every time you make a change in the system, you need to manually check all the tests (i.e. specification) for other parts that might be affected by the change.

Specifying the external behavior of the system in clear, readable tests (maybe in BDD syntax) might help alleviate the problem, since it will be easier to scan the specification for potential problems.

Also, if we’re using TDD, we could consider adding a new step to our TDD workflow. If you’re modifying a large system, after you write a failing test and make it pass, then you should look for other tests that should be changed/added in other parts of the code, make them fail and fix them.

Another lesson is, we should be careful when changing the meaning of data. In our example, every row in the ORDERS table had the implicit status SALE before we introduced the refund concept. After adding the STATUS column, now a row can mean either a sale or a refund. This affects other code which implicitly assumed that every row in the table was a SALE row.

So, when we change a data format, it could be a nice starting point to look for code that’s coupled to the same data, and check their specifications to see if any modifications are needed.