Weak Unit Tests? Mutants to the rescue!

Published in

TestAutonation

4 min readFeb 17, 2023

As a software quality and testing professional (SDET/ QA Engineer / QA Specialist/ Test Automation Engineer etc ..) you probably got this question a few times “What is our current testing coverage?”. The expectation by the person posing the question would typically be an answer containing a percentage value, with the benefit of confirming to them that a value close to or equal to 100%, the product is “thoroughly” tested, ready to deploy, and bug-free.

Well, whatever the value is, it does not necessarily give these indications. It also depends on to who this question is asked. A backend/frontend developer will think about the unit test coverage of their services and quickly get that value from the latest CI pipeline job run which would report a percentage line coverage with a tool such as Jacoco and Istanbul.

A software tester, on the other hand, would think about the test coverage from a different perspective. They might consider the different types of tests that have been run, such as integration tests, system tests, and acceptance tests. They might also look at the E2E tests of the business user scenarios that have been covered, and whether those scenarios are sufficient to ensure the quality of the product, keeping in mind how the product will be utilised and which risks might have the highest impact. They would consider the functional test coverage of their latest test plan execution, and will report a percentage value with the result from an HTML report of a tool like Selenium, Cypress or Playwright.

In both cases, the focus is on test coverage, but the definition of coverage and the methods used to measure it can vary greatly. As we go further up the imaginary pyramid of Testing, the measurement as well as the contents of the measurement gets more abstract and more opinionated. However, at the lowest level, ie Unit Tests, things are more clear.

Both views provide important metrics that help in tracking the progress and effectiveness of the testing efforts during the development cycle. However, they do not give an indication of the quality or strength of the tests themselves. This is where Mutation Testing comes into the picture.

Unit test coverage != Unit test strength

Mutation testing is a technique used to evaluate the quality of a set of unit tests by introducing small, intentional changes, known as mutations, in the code under test. These mutations are designed to simulate the types of bugs or defects that the unit tests are expected to find. The idea is that if a set of unit tests is able to detect and fail the mutated code, it means that the tests are strong enough to find real defects.

For example, consider the following Java code:

package com.mutant.example

public class IntegerIdentification {
   
  boolean isPositive(final int num) {
    return num >= 0;
  }
}

Suppose we have a unit test that calls this method and checks if the result is as expected:

public class IntegerIdentificationTest {


  @Test
  public void isPositiveTrue() {
      IntegerIdentification integerIdentification = new IntegerIdentification();
      Assert.assertTrue("Integer was not positive", integerIdentification.isPositive(2));
  }
  
  @Test
  public void isPositiveFalse() {
      IntegerIdentification integerIdentification = new IntegerIdentification();
      Assert.assertFalse("Integer was positive", integerIdentification.isPositive(-1));
  }

}

A mutation testing tool like Pitest would create a mutation in the isPositive method by replacing the ≥ operator with a > operator, resulting in the following code:

boolean isPositive(final int num) {
    return num > 0;
}

If the original unit test suite is able to detect this change and fail the test, it means that the tests are strong enough to detect real defects (or unintentional changes) in the code. On the other hand, if the tests pass, it indicates that the they are weak and there is conditions which are not covered by the tests. Running Pitest in this case will therefore report that the existing tests are weak as a change in the implementation was not covered by a test case.

Mutation testing helps in identifying the weaknesses in the unit tests. This, in turn, increases the confidence in the quality of the product and reduces the risk of defects being introduced in the final product. Pitest tries to ‘attack’ the code with mutants and then report a Mutation Coverage and Test Strength percentage alongside the traditional Line Coverage percentage.

Tools are helpful as long as the team is onboard

Introducing a tool can improve the team’s output as long as the team is onboard with adapting it, with the right mindset. Pitest allows you to define a mutation coverage and/or test strength threshold so that the codebase does not degrade below a defined percentage. This could be used as another quality gate within CI/CD so that the build fails when conditions are not met. Having the team onboard means that the corrective measures are taken and unit tests are improved as much as possible. Otherwise, the additonal check will be quickly turned off the next time a build fails and something needs “to be quickly deployed”. So make sure that the team understands the concept and the benefits before adopting Mutation Testing! Let’s defeat those mutants!

Weak Unit Tests? Mutants to the rescue!

Unit test coverage != Unit test strength

Tools are helpful as long as the team is onboard

Written by Dwane Debono