We all know that unit tests help us to be sure that code works as we expected. And one of the metrics we can use with unit tests is a Code Coverage.
But is it a good metric? Does it have a practical sense and can we really trust it? Cause if we remove all
assert lines from the tests, or just replace them with
assertSame(1, 1) , we will still have 100% Code Coverage while our tests will prove nothing!
How confident are you in the project’s test suite? Does it cover all branches of the code? Does it test anything at all?
Mutation testing gives us the answers to these questions.
Mutation testing is a testing methodology that involves modifying a program in small ways and analyzing reactions of the test suite on these modifications. If tests pass after the code is changed, then we have either not covered line of code or the tests are not very efficient for the mutated piece of code.
Basics of Mutation Testing
Let’s define some key concepts. Mutation Testing starts with Source Code and Unit Tests. For reasons of simplicity, we will call all automated tests unit tests.
As soon as we have our source code and unit tests for it, we can start changing (mutating) source code in order to have a new system and analyze how our unit tests behave with this changed code.
A single small change of the source code is called a Mutation. For example, changing a binary operator
+ to a binary
— is a mutation.
The result of a mutation is called a Mutant. Mutant is a new mutated source code. In the example above it’s a
$c = $a — $b .
Each mutation of any node in the code leads to a new Mutant. In a real project, we will have thousands of them.
Besides changing a binary operator
— , there are plenty of other Mutation Operators (or Mutators):
- Negating conditionals;
- Changing return values;
- Changing method’s visibility;
- and so on.
So, mutation testing creates mutants from the source code, runs the unit tests against them, and checks whether any of the unit tests start to fail.
If tests fail — then a mutant is considered Killed, and this is a positive outcome. In this case, tests caught the error and detected that something was wrong in the mutant’s source code.
If tests pass, we say that the mutant has survived the unit tests (Survived, Escaped Mutant). There are 2 reasons why the mutant can survive:
- The line of mutated code is not covered by tests;
- Tests are not very useful for this line of code.
It is important to note that Mutation Testing is not a random set of code modifications. Mutation Testing is a predictable and clear process that always generates the same mutators for the same source code.
Let’s have an example. We will use Infection — Mutation Testing Framework for PHP. (Anyone who uses another programming language — keep reading. 99% that there is a mutation testing framework for your language as well with the very similar functionality.)
Consider the following Filter that filters out users older than 18 years from a collection, written in an Object Oriented way.
And for this filter we have a unit test:
The unit test is pretty straightforward: we add two users and expect that only one of them will be returned in the collection with age of 20.
Did you notice that having only this test we already have 100% Code Coverage of
UserFilterAge class? Ok, let’s run the mutation testing and analyze the results:
With 100% Code Coverage, we have only 67% of Covered Code MSI (Mutation Score Indicator) — this doesn’t look good.
How is MSI calculated?
Mutation Score Indicator (MSI): 47%
Mutation Code Coverage: 67%
Covered Code MSI: 70%
Mutation Score Indicator (MSI)
MSI in this example is 47%. It means that 47% of all generated mutants were detected (kills, timeouts, errors). MSI is the main metric of mutation testing. With the code coverage of 65%, there is an 18% difference so Code Coverage was a terrible quality measurement in this example.
TotalDefeatedMutants = KilledCount + TimedOutCount + ErrorCount;MSI = (TotalDefeatedMutants / TotalMutantsCount) * 100;
Mutation Code Coverage
This metric is 67% in the example above. On average, it should equal the Code Coverage percentage.
TotalCoveredByTestsMutants = TotalMutantsCount - NotCoveredByTestsCount; CoveredRate = (TotalCoveredByTestsMutants / TotalMutantsCount) * 100;
Covered Code Mutation Score Indicator
MSI for code that is actually covered by tests was 70% (ignoring not tested code). This shows you how effective the tests really are.
TotalCoveredByTestsMutants = TotalMutantsCount - NotCoveredByTestsCount;TotalDefeatedMutants = KilledCount + TimedOutCount + ErrorCount;CoveredCodeMSI = (TotalDefeatedMutants / TotalCoveredByTestsMutants) * 100;
If you examine these metrics, the standout issue is that the MSI of 47% is 18 points lower than the reported Code Coverage at 65%. These unit tests are far less effective than Code Coverage alone could detect.
Let’s see what mutations have been generated by Infection.
The first one:
The tests for this mutation are green. It means that mutated code behaves the same as the original one from tests point of view. But indeed this is not true.
When you write tests for the code with intervals, you must test boundary values.
Let’s kill the Mutant!
We’ve added one more test case —with the boundary value 18. Now, if you run the test suite again for the first mutant, it will fail.
The second mutation:
It’s not immediately obvious what is going on here. This is quite interesting mutation operator that replaces function call in the expression
return functionCall(); with
functionCall(); return null; .
But why did this mutation happen? Is it correct to return
null when we expect
array to be returned? No. This happens because we don’t have return type for
__invoke() function. Mutation Testing Framework (MF) sees that return value is nullable and tries to change it.
Infection is pretty smart and, if the function has not nullable return type, it does not mutate the code. To kill this mutant, we can simply add return typehint:
Now the function signature is clear. We pass an array and expect a filtered array to be returned by Filter.
If we run Infection again, we will have the following metrics:
The number of mutations has been decreased (because of added typehint) and all mutants have been killed.
Now we have not only Code Coverage 100%, but Mutation Code Coverage 100% as well, which is more reliable metric. Mutation testing makes you think that it leads to a “more than 100% test coverage”.
If you are still not impressed, we will look at another powerful mutation operators —
ProtectedVisibility. The goal of these mutators is to verify that the visibility of a method is necessary. If the visibility of a method can be reduced from
protected without failed tests, this may indicate that the Public API of a class is larger than it should be.
In case of an escaped mutant with
ProtectedVisibility mutator, we have no child classes that override
protected method and visibility of this method can be safely changed to
For example, running Infection for
FosUserBundle shows that the visibility of method
isLegacy can be easily reduced.
./infection.php --threads=4 --show-mutations --mutators=PublicVisibility,ProtectedVisibility
Besides these 2 cases with killed and escaped mutants, we can also have situations with timeout errors. Changing unary operator
++ for a counter variable to
-- can lead to infinite loop. Mutation testing framework should correctly handle this situation and mark such mutant as Timeout. This is a positive result and such mutant is not considered as escaped.
Infection requires PHP 7.0+ and either
phpdbg installed to generate Code Coverage.
It is recommended to use Infection as a
PHAR distribution because in this case mutation framework does not conflict with your dependencies and you can use
At the moment, Infection supports two testing frameworks —
PHPUnit (5, 6+) and
On the first run you will be asked some general questions about the source and excluded folders and
infection.json.dist config will be created. It should be committed to VCS if you are going to use Infection in your CI setup.
Basically, mutation testing required human analyzing and because of that all generated mutations and information about escaped, killed and timeouted mutants are saved to
The most interesting options Infection can be run with:
If you want to run tests for mutated code in parallel, set this to something > 1. It will dramatically speed up mutation process. Please note that if your tests somehow depend on each other or use a database, this option can lead to failing tests which give many false-positives results. Make sure to analyze
Shows colorized diffs of mutated files to the console.
This is a comma-separated option to specify a particular set of mutators that need to be executed. Example:
See the complete list of available mutators.
These two options are useful when you run Infection as a step of your CI process.
--min-msi — a minimum threshold of Mutation Score Indicator (MSI) in percentage. This option forces you to write more tests with each push (build).
--min-covered-msi — a minimum threshold of Covered Code Mutation Score Indicator (MSI) in percentage. This option forces you to write more effective and reliable tests. No new tests are required.
They can be used separately or in conjunction.
./infection.phar --min-msi=80 --min-covered-msi=95
Using Infection with Travis CI
- wget https://github.com/infection/infection/releases/download/0.6.0/infection.phar
- wget https://github.com/infection/infection/releases/download/0.6.0/infection.phar.pubkey
- chmod +x infection.phar script:
- ./infection.phar --min-covered-msi=90 --threads=4
Each release is signed with an
openssl private key, so you need to download a public key in order to work with
phar. If you rename
infection, then also rename the key from
How to use Mutation Testing?
How can you use mutation testing with your work or pet projects? Is it possible to use it with an existing one? Where should you start?
Daily basis usage for developer
Mutation testing can be useful for writing new tests. The workflow is the following:
- You write a new class, e.g. that
- This class is already covered by tests;
- To check the efficiency of these tests, you run MT just for this file.
./infection.phar --threads=4 --filter=UserFilterAge.php --show-mutations
You see the feedback right in your terminal and try to get 100% Covered Code MSI. Sounds not very difficult and will get you much more reliable test suite.
After some time of using mutation testing, you will notice that the code you write is less verbose, unit tests are better and you write them with branch coverage in mind instead of line coverage.
Daily basis usage for project
It is possible to use mutation testing with Continuous Integration. Depending on your project size, it can be run for each build or, for example, just once per day if it takes too much time. The main point here is to read the log file and constantly improve your tests.
While reading the log file from time to time can help with identifying useless tests, it’s better to use MF with
--min-covered-msi options that fail the builds.
It’s funny that Infection runs mutation testing on itself and when the metrics are getting below the threshold, we get red builds.
Sometimes it is not possible to have 100% MSI
Sometimes, a mutation doesn’t change the behaviour of the system at all. When this happens, we call the mutant an equivalent mutant. Adding unit tests or changing one of the existing unit tests won’t kill it.
Example of the equivalent mutant from a real project:
When you multiply or divide the value by ±1, you always have the same result. Thus, this mutation can’t be killed.
Another interesting example of the equivalent mutator is
break -> continue:
The first case is a great and useful mutator, but what about the second one? Is it a valid syntax in PHP? Surprisingly yes.
Note: In PHP the switch statement is considered a looping structure for the purposes of continue.
break(when no arguments are passed).
Because of this feature, Infection does not mutate
So, speaking about MSI percentage, you don’t need to expect 100% MSI for your project. Just use the value you are comfortable with.
Abstract Syntax Tree
Abstract Syntax Tree (AST) — is a tree representation of the abstract syntactic structure of source code.
Building an AST from the PHP source code is possible thanks to incredible PHP-Parses lib.
Let’s see what the source code mutation is. In order to mutate the code, we have to:
- Split given source into PHP tokens (function token_get_all()) and store them in an array;
- Loop through these tokens and decide whether particular token should be replaced according to one of the mutation operator or not;
- Reconstitute the new source code (mutant) from the updated array of tokens.
Example of tokens:
T_OPEN_TAG ('<?php ')
T_WHITESPACE (' ')
But in reality the process is much more complicated because deciding whether the token should be replaced depends on several conditions:
- Are we in the function body? (we don’t want to replace
T_OPEN_TAG ('<?php '), right?);
- Will the mutated code be valid after mutation is done? For example, arrays union
['a'] + ['b']is a valid code, but arrays subtraction
['a'] — ['b']is a Fatal Error. This mutation should be skipped by MF.
Having just an array of tokens makes it so hard to answer to the questions above from the code point of view.
On the contrary, having an AST and operating Objects that represents the source code (
Node\Expr\Array_) makes it less painful.
Here is an example of mutation operator
Plus in Infection, that replaces
- and checks the case with arrays:
Let’s compare it with the Humbug’s implementation based on
Obviously, using AST simplifies things a lot. With Abstract Syntax Tree, it is
- Much easier to support code;
- Much easier to write new Mutators;
- Much easier to handle false-positives and different edge cases, e.g. deciding when mutation should be done or should not in a difficult situation.
To sum it up: mutation testing is a very powerful methodology for improving the quality of the project’s test suite. You definitely should give it a try.
Already have an experience with MT? Please, share in comments.