Beyond Unit Testing: How Mutation Testing Helps to Improve Code Quality and Reliability

7 min readMay 31, 2023

Faster software delivery has become critical in today’s competitive business landscape. Organisations that can deliver high-quality software faster than their competitors gain a significant advantage.

“We see continued evidence that software speed, stability, and availability contribute to organizational performance (including profitability, productivity, and customer satisfaction). Our highest performers are twice as likely to meet or exceed their organizational performance goals.” DORA, Accelerate: State of DevOps 2019

By integrating automation testing into continuous delivery journeys, organisations can accelerate their software delivery, improve the quality of their code, and gain a competitive advantage. At the Emirates Group, we are highly focused on automating tests, and integrating it for faster developer feedback.

While automation testing is an essential tool for us to achieve faster software delivery, over time we have had to face the challenges that come with it. One of the biggest difficulties of having a highly automated test suite is ensuring the quality of tests and maintaining sufficient test coverage. With many automated tests, our engineers had to assess which tests were providing the most value and which tests were redundant or unnecessary.

Unit testing — Key to high-quality software development

Unit testing plays a critical role for us in achieving faster software delivery by allowing developers to catch bugs at the lowest level, early in the development process. To make sure there is enough coverage through unit tests, we had been measuring code coverage as the key metric and included the same checks in our continuous testing journeys to ensure minimum code coverage percentage. Over the years we observed code coverage is not enough on its own to ensure comprehensive testing. Code coverage simply measures the percentage of what has been executed by tests, without considering its effectiveness in identifying bugs. It is possible to have high code coverage but still miss critical defects in the code.

Mutation testing comes to the rescue!

Here’s a bit of history. According to Wikipedia, the concept of mutation testing was first introduced by Richard Lipton in the 1970s, and it has since gained popularity among software developers and testers. Mutation testing is a valuable tool for identifying the gaps in test coverage. It has grown in popularity lately due to the growing complexity of software systems, and the availability of new tools and frameworks that make it easier to implement.

How Mutation Testing works

The basic process of mutation testing involves generating minor changes, or mutations, to the source code of a program. These mutations are designed to simulate common programming errors, such as changing a logical operator from || to && or swapping the order of two lines of code. The idea is to introduce defects into the code to see if the test cases can find them.

Once the mutations are introduced, the test suite is run against the mutated source code. If the test cases can identify the mutation, it is considered “killed”. If the mutation goes undetected, it is considered “survived”.

Here is an example to explain how mutation testing works. Suppose we have the following code:

def divide_(x,y):
  return x/y

We also have a test suite containing one test case that checks that the function works correctly for valid input –

def test_divide_():
  assert divide_(4,2) == 2

If we introduce a mutation by changing the division operator to a addition operator, the code will look like:

def divide_(x,y):
  return x+y

If we run the test case against this mutated code, it should fail, as the expected result of dividing 4 by 2 is 2, not 6, which means the mutant is killed.

Consider another mutation by changing the division operator to a minus operator:

def divide_(x,y):
  return x-y

If we run the test case against this mutated code, it should pass, as the result of subtracting 2 from 4 is also 2, which means the mutant is survived.

Mutation Score is the metric that shows the quality of a test suite. It is calculated as follows -

Mutation Score = (Killed mutants/Total mutants) x 100%, where -

Killed Mutants: the number of mutants that were detected and killed by the test suite.

Total mutants: the total number of mutants that were generated by the mutation testing tool.

How Mutation Testing is helping us to write better code

Introducing mutation testing in our continuous integration journey opened a huge potential for identifying the scope of improvement, fill in the testing gaps and measure the quality of the unit tests. The quantitative approach allows the developers to write better tests, which are more functionally focused. The engineers get more confidence in the code changes due to improved code quality.

Introducing mutation testing provided us with several benefits

While mutation testing helps us to get benefits of having more insight and work on top of it, it also comes with its own complexity. One of the greatest challenges we face is that this type of testing is computationally expensive. The process of running mutation testing on a complex and large repository can take many hours, which poses a challenge to add such testing to continuous integration journey. It also adds a complexity of overfitting. It can suffer from overfitting if the test suite is optimised to pass the specific mutants, rather than being representative of the functionality of the code. This can lead to false positives.

The practical mutation testing approach

While mutation testing is a great technique to improve the quality gaps, it is challenging to find out a balanced approach to bring out the most value by implementing it. Here is our approach to make the mutation testing effective –

Selecting appropriate mutant operators,
Defining minimum mutation score,
Running mutation tests frequently
Prioritising detected mutants and
Using mutation testing to guide test cases development.

Pitest — the mutation testing tool

There are several mutation testing tools available in the market, each with its own pros and cons. Some mutation testing tools are designed for specific programming languages, while others are more versatile. No single mutation testing tool is perfect for all use cases. While choosing the right mutation testing tools, we considered factors such as ease of speed, use, and the range of mutations that the tool can detect. We decided to use Pitest, as it is highly aligned to the tech stack (Java Spring Boot) on which most of our middleware services are written.

Configuring PIT with Spring Boot microservices –

Setting up Pitest for Spring Boot based microservices is fairly straightforward. You need to include the Pitest plugin and dependencies to the pom.xml file –

<plugin>
    <groupId>org.pitest</groupId>
    <artifactId>pitest-maven</artifactId>
    <version>LATEST</version>
    <configuration>
 <targetClasses>
     <param>com.example.service.**</param>
 </targetClasses>
 <targetTests>
     <param>com.example.service.**</param>
 </targetTests>
    </configuration>
</plugin>

To generate the Pitest report, you need to execute the following command –

mvn clean test org.pitest:pitest-maven:mutationCoverage

This will run all the unit tests and generate a mutation coverage report for the Spring Boot service. Once the command is executed, an HTML report should be generated under target/pit-reports.

To exclude certain test classes from Pitest analysis, all the exclusion classes can be added to <excludedTestClasses> configuration element to the configuration section of the Pitest plugin in the pom.xml file.

Configuring mutants in Pitest

When performing mutation testing, it is essential to focus on mutants that are likely to find out defects in the code. To configure the mutants in Pitest, a configuration file can be used that specifies the mutation operators, their parameters, and the classes and methods to be mutated. The configuration file should be in XML or YAML format and placed in the root directory of the project.

mutators:
  - INCREMENTS
  - CONDITIONALS_BOUNDARY
  - NEGATE_CONDITIONALS
  - RETURN_VALS
  - VOID_METHOD_CALLS

targetClasses:
  - com.example.TargetClass

targetTests:
  - com.example.TargetTest

To use this configuration file with Pitest, the path to the file needs to be specified in the Pitest plugin configuration in pom.xml –

<build>
  <plugins>
    <plugin>
      <groupId>org.pitest</groupId>
      <artifactId>pitest-maven</artifactId>
      <version>LATEST</version>
      <configuration>
        <outputFormats>
          <format>XML</format>
          <format>HTML</format>
        </outputFormats>
        <mutatorsFile>pitest.yml</mutatorsFile>
      </configuration>
    </plugin>
  </plugins>
</build>

Running mutation coverage on code changes

scmMutationCoverage is a feature in Pitest that allows to calculate the percentage of mutations that have been covered by the tests based on the changes made to the codebase since the last commit. This feature is useful in measuring the effectiveness of the tests and identifying areas of the code that need more focus for testing. It also allows to configure the coverage threshold — a minimum threshold can be set for a project using the following configuration:

<build>
  <plugins>
    <plugin>
      <groupId>org.pitest</groupId>
      <artifactId>pitest-maven</artifactId>
      <version>LATEST</version>
      <configuration>
        <scmMutationCoverage>
          <minimum>85</minimum>
        </scmMutationCoverage>
      </configuration>
    </plugin>
  </plugins>
</build>

Adding Mutation Tests in CI — with scmMutationCoverage on change on the codebase

In conclusion, mutation testing is a valuable tool for the teams looking to deliver more reliable, efficient software to the customers. By intentionally introducing small changes to the code and measuring the ability of unit test suites to detect the changes, mutation testing can help to identify faults in the code and optimise testing efforts. Ultimately, this can lead to faster software delivery, as bugs or other issues are caught early in the development process and addressed before they cause significant problems. I hope you enjoyed reading this blog. Happy testing!