Introduction to Mutation Testing with Stryker

Published in

DigIO Australia

8 min readJun 3, 2019

In this post I will be looking at mutation testing. What it is, as well as an example to show what it can do, limitations to consider and how it can be added to a Javascript project using Stryker. The example used in this post can be found here.

What is Mutation Testing?

“Quis custodiet ipsos custodes” which in English means “Who will guard the guards!!”. Mutation testing is a way of testing the quality of your tests by introducing changes into application code and seeing if our test suite detects them. It is a type of white box testing which is mainly used for unit testing. This is done by making extremely small changes to the code, so it does not affect the overall objective of the program. This helps assess the quality of the test cases, which should be robust enough to fail when the mutated code has been injected. This method is also known as fault-based testing, as it involves creating a fault in the program.

“Mutation testing is based on two hypotheses. The first is the competent programmer hypothesis. This hypothesis states that most software faults introduced by experienced programmers are due to small syntactic errors. The second hypothesis is called the coupling effect. The coupling effect asserts that simple faults can cascade or couple to form other emergent faults.”

- wikipedia

Why Mutation Testing?

Traditional test coverage (i.e line, statement, branch, etc.) only looks at the code executed by our tests. It does not check that our tests are actually able to detect faults in the executed code.

One of the more extreme examples of poor test coverage are tests with no assertions. Fortunately, these are uncommon in most code bases. Much more common, is code that is only partially tested by a test suite. Partially tested code can still execute all of its branches, but may not cover all the scenarios.

How does it work?

Faults are introduced into the source code by creating multiple versions of the code, each version is called a mutant. Each mutant contains a single fault, and the goal is to cause the mutant version to fail which demonstrates the effectiveness of the test cases.
Test cases are applied to the original program and also the mutant program.
Compare the results of the original and mutant program.
If the original program and mutant programs generate the different output, then that the mutant is killed by the test case. Hence the test case is good enough to detect the change between the original and the mutant program.
If the original program and mutant programs generate the same output, mutant is kept alive. In such cases, more effective test cases need to be created that kill all mutants.

Suggested tools

Stryker — Javascript
PIT — Java or Kotlin

In this post, we’re going to talk about Stryker and how it can be used for mutation testing.

Automate Mutation Testing with Stryker

Mutation testing can be extremely time consuming and complicated to execute manually. If we consider the number of lines in our code, then given the number of mutants that need to be generated to ensure all the code is sufficiently covered, it can take up a lot of manual effort.

Sounds complicated? Don’t worry, with Stryker, it makes this process a lot easier and faster to run. It will only mutate our source code, making sure there are no false positives.

Example:

Here is an example from stryker blog, you need to check the age of a user for a credit card application tool. Users can only proceed if they’re over 18. Say, we write the following function to confirm if a user can continue:

Stryker will then modify the code by locating the return statement and changing it into different variations, such as:

return user.age > 18;
return user.age < 18;
return false;
return true;

These modifications are called mutants. After the mutants have been generated, they are applied one by one and our tests are executed against them. If at least one of our tests fail, then the mutant is killed. And that’s what we want! If no tests fail, then the mutant has survived. The better our tests, the fewer mutants should survive.

Stryker can output the results in a variety of different formats. One of the easiest to read is the clear-text reporter (default reporter for Stryker). This is the output from Stryker using the clear-text reporter:

Mutant killed: /Path/filePath.js: line 1:9
Mutator: BinaryOperatorreturn user.age >= 18;return user.age > 18;
Mutant survived: /Path/filePath.js: line 1:9
Mutator: RemoveConditionalsreturn user.age >= 18;return true;

The clear-text reporter will output how exactly our code was modified and which mutator was used. It will then tell us if a mutant was killed, meaning that at least one test failed, or if it survived. In the output above, you can see the second mutation has been marked as a survivor. This means a test is most likely missing, one that explicitly tests for an age lower than 18.

Try it yourself with same example

Clone the repository:

git clone https://github.com/pjagga/mutation-testing-strykergit git checkout step1npm install

This contains the function ageIsOldEnough and one unit test. Run the test using :

npm run test

You will notice the test has run and passed.

Let’s check the coverage using:

npm run coverage

Coverage is 100% and everything passes. Now, let’s see what mutation testing has to say about it.

Let’s start with installing stryker-cli

npm i -g stryker-clistryker init

Choose the following options in the questionnaire:

? Do you want to install Stryker locally?: npm
? Are you using one of these frameworks? Then select a preset configuration. None/other
? Which test runner do you want to use? If your test runner isn't listed here, you can choose "command" (it uses your npm testcommand, but will come with a big performance penalty) jest
? What kind of code do you want to mutate? javascript
? [optional] What kind transformations should be applied to your code? (Press to select, to toggle all, to invert selection)
? Which reporter(s) do you want to use? html, clear-text, progress
Note: Use spacebar for multiple selection or choose html and press enter
? Which package manager do you want to use? npm

This will generate stryker.conf file in your repo. Looks like this:

Now add the source file you want to generate your mutants for (it will be the source code file), in this case ‘./index.js’. After adding mutate: [“./index.js”] stryker.conf.js looks as follows:

stryker run’— This will generate mutants for against your mutate file (./index.js), and will run your tests in test.js and see if the mutants are killed. It will also provide a report using clear-text (shown below ) reporter showing % of code covered. It also provides suggestions on which mutants need more code coverage.

HTML Reporter coverage looks like this:

Note: checkout branch step2 in the same project to see stryker.conf and reports

So, now we can see although we have 100% code coverage, mutation testing using Stryker we have a mutation score of only 66.67%. Our test, although provides great statement coverage, does not cover all possible scenarios, for example the conditions userAge < 18 and userAge = 18 have not been covered. So now if we add a couple more tests for the additional coverage on the conditions, we can kill the remaining mutants.

Checkout branch step3 to see the new tests and run stryker run, the new report will look like this:

So reading this report 6 mutants were generated for your source code, 3 tests ran and 6/6 mutants were killed.

Html report:

Some more advantages and disadvantages of Mutation Testing:

Advantages

Identifying the number of mutants that have survived vs killed is a more reliable metric than simply using line coverage. It actually ensures your unit tests are testing what they should be.
It catches many small programming errors, as well as holes in unit tests that would otherwise go unnoticed.

Disadvantages

Running a mutation testing framework against an entire complex codebase is computationally very expensive, requires a lot of processing power to complete.
Runs can take anywhere up to several hours, making them unsuitable within a fast release process. Of course, you can run a framework overnight and check the report later — like with line coverage, you get value from identifying general trends as well as specific failures. You can set aside time to go through a report in more detail and start to identify some obvious deficiencies in your tests.
Mutation testing requires brainpower to sort ‘junk’ mutations from useful ones. Not every surviving mutant is legitimate, you can get an unfavourable signal-to-noise ratio. In these cases it’s potentially still useful to compare trends over a period of time, to ensure surviving mutations don’t keep increasing.

Conclusion

At the end of the day we can’t really compare mutation testing with line coverage both have their place in ensuring code quality. As line coverage focuses more on whether lines in the code have been executed somewhere along the way. On the other hand mutation testing is also a valuable way to ensure your tests works. Like with line coverage, you get value from general trends as well as specific failures whereas, mutation testing gives us a much clearer view of what our unit tests are actually testing. I urge you to give it a try — you may be surprised at what your tests are missing.

Go kill some Mutants !