Tests Coverage is Dead — Long Live Mutation Testing

In today’s industry, having a great product might not be enough. Competition lurks around the corner, and a key advantage for market leaders is their ability to move fast without breaking things.

Companies are becoming more and more quality obsessed. As we all know, we are very spoiled as end users. How many of you thought and even said the following sentence out loud: “I hate this SOME_FREE_TOOL! It failed to load once when I needed it.” As a result, many companies put product quality on the pedestal and regard it as their holy grail.

Simply searching the web for how to guarantee quality as part of a Continuous Integration/ Continuous Deployment (CI/CD) cycle, one can easily understand that automated tests are a must and unit-tests are one of its keystones.

But “How many tests?” one might ask the input box in his favorite search engine. Unfortunately, the answer here is vague.

Investigating the world wide web, you see Code Coverage is the key index out there, and depending on your code type (functionality, logic, etc.), you can define a random code coverage percentage you want to enforce. Trying to achieve this, you can see that, in most cases, 100% is almost impossible. While 80% is better for your code, is 100 really better than 80? If you add more tests, does this mean your tests suite is stronger? Or is it just causing an exaggerated overhead on adding functionality?

There are some pitfalls in pursuing a random coverage index. From my personal experience, as a hard-working developer, I sometimes feel I am investing too much time in creating and modifying tests during all development and maintenance phases. On the other hand, I once heard from a sneaky and lazy developer friend that he heard some developers are adding tests to improve coverage, but that these tests are not fully covering the functionality — OMG!!!

Improving coverage is easy — just call the function, or load the object, use different inputs — done! And it doesn’t always happen on purpose. Sometimes we simply didn’t think about all edge cases of the unit being tested, perhaps since we were so focused on killing the red lines of uncovered code.

Stop Killing Red Lines — Kill mutants

One great solution is to stop obsessing about test code coverage and take it with a grain of salt. I urge you to foster Mutation Testing as the key index for a test suite strength.

Mutation Testing includes changing the code — a small part at a time — creating mutants, and running the unit tests suite repeatedly. If the tests still pass — meaning that they were not able to identify the mutant (which might represent a wrong behavior) — we can conclude that they are less strong, and vice-versa.

For example, let’s say I need to write a function which verifies if my nuclear reactor should be cooled or if it will have a nuclear meltdown (not good) — this would be true if the temperature is higher than 1000 °C. The simple implementation of function and tests would look something like this:

function isDangerous(temp) {
if (temp > 1000) {
return true; //run away now!
} else {
return false; //it’s a very nice…
}
}
test(“test isDangerous”) {
expect(isDangerous(500)).toBeFalsy();
expect(isDangerous(2000)).toBeTruthy();
}

The tests are passing and the test coverage would be 100% — yay! However, as we know, code coverage is not the best indicator for test strength and thoroughness.

If we were to execute some mutation testing tool, it would run these tests with a simple mutation of changing ‘>’ to ‘>=’ (which is a common mutation) and would expect my tests to fail (since a possible change in the functionality was made). This exposed a weakness in my tests — I forgot to check and make sure I am handling the number 1000 correctly. Meaning my tests may be good, they are still not thorough enough. Fixing this is easy, and just requires adding:

expect(isDangerous(1000)).toBeFalsy();

This testing methodology can provide us with a much better answer to the question of “How good/strong are our tests?”

A great tool for this is the stryker-mutator, which is simple to execute in CLI, supports main testing frameworks (karma, jest, jasmine, etc.), and whose output is a solid tool for improving your test strength and software resiliency.

At AppsFlyer, our SDK runs on 9 out of 10 mobile devices worldwide, and our servers are currently handling around 50 billion events daily. We consider ourselves customer-obsessed, and as such, we cannot fail our customers while moving fast and adding features to a rapid CI/CD environment. Therefore, we need to have transparency for the real quality of our system resiliency and tests quality — and I feel mutation testing provides us with such in many projects.

Not afraid of mutants? — Join us!