Mutants to the Rescue

Published in

Takealot Engineering

10 min readOct 6, 2022

By Rowan Pillay and Filipe Pinto Teixeira

Unit testing is a well-established and accepted best practice in modern software engineering. Anyone who disagrees with this statement should probably be kept far away from any code destined for production.

In fact, almost every engineer would readily agree that unit testing improves code quality, allows early detection of bugs, increases confidence in deployment and is an enabler for shorter deployment cycles. While this is true, it is only true for well-written unit tests. It is that “well-written” part that is often underemphasised.

The Code Coverage “Deception”

Statement coverage is the simplest metric of unit testing and shows the developer how much of the code base is covered by unit tests. This is often reported as the percentage of a particular service’s total lines of code executed during testing.

While, in general, it is good to have a high code coverage percentage, or at least aim for a high percentage, this value speaks nothing to the quality of the tests that have been executed. For example, it is possible to achieve 100% code coverage with absolutely no assertions at all. Similarly, 100% code coverage can be achieved with a test suite that misses important tests. Therefore, a service with 100% code coverage enabled by a large number of unit tests may still be poorly tested.

Quality Over Quantity

Therefore, while it is still important for developers to aim for high code coverage in their unit tests, they should also aim to write high-quality unit tests. Many factors make up the quality of unit tests, but one particular factor is that good quality unit tests verify the behaviour of the code being tested.

It sounds obvious that if the code under testing is doing what it is meant to be doing, tests should pass. Likewise, if the business logic of the code under testing is changed for whatever reason, perhaps during a refactor, tests should fail. Good unit tests properly assert the validity of the original logic and should consistently fail under the scenario where business logic has been changed, thereby preventing software regression.

We’re Only Human

At Takealot developers are expected to leverage the pull request peer review process to ensure that the quality of service tests is maintained. However, it is unrealistic to expect fellow developers to find every issue in every test of every pull request in a large scale system. It is best to automate as much of this process as possible.

This is why the Takealot Engineering Division has introduced Mutation Testing to its CI/CD process.

Mutants to the Rescue

Mutation Testing is a mechanism that can be used to prove the quality of tests, particularly the presence and quality of the test assertions. Simply put, mutation testing is a method developers may use to test the tests. It solves the exact problem laid out previously when only code coverage is used as a tool to measure how well a code base is tested. It does this by testing what happens to the existing unit tests when bugs (called mutants) are introduced to the code. This proves to be a great way of testing the quality of the tests that are already present and also highlights tests that are missing.

During mutation testing, different bugs (or mutants) are automatically inserted into the code. The unit tests are then run for each of these mutants added to the code. If the tests are well written, one or more tests should fail. Since bugs have been explicitly added to the code, a failing unit test shows that the test is indeed correct, i.e. that it is effectively testing the business logic. In fact, it is one of the few times developers should be happy when a test fails.

If one or more tests fail after the mutant is added, the mutant is said to have been killed. This is good. If all the tests pass after a mutant is added, the mutant is said to have survived. This is bad. It indicates a lack of assertions in one or more tests or that tests are missing.

The percentage of mutants killed is referred to as the “mutation score”, where 100% means all mutants have been killed. A mutation score of 0% means all mutants have survived, i.e no tests have failed. A high mutation score means that future regressions are unlikely to be missed. If the mutation score is high, it means that if a future developer were to make a change to the code that breaks the business logic the tests would likely pick it up. This is exactly what tests are there for — to prevent future regression of the software.

A Mutant Example

Perhaps the best way to understand the value that mutation provides is by way of an example. Imagine the following method:

def multiplyOrDivide(a: Int, b: Int): Int = {
 if (a <= b)
   return a*b
 else
   return a/b
}

The method multiplies or divides the arguments based on which number is the greatest. Below are the imaginary tests that were written for this particular code.

"multiply or divide" must {
 "return the multiple value when a < b" in {
   StrykerDemo.multiplyOrDivide(2,3) mustBe 6
 }

 "return the division value when a > b" in {
   StrykerDemo.multiplyOrDivide(4,2) mustBe 2
 }
}

There is nothing inherently wrong with these tests. They are both valid. However, the problem here is that there is a missing test. There is no test to validate the method’s behaviour when the two parameters are equal. So, continuing with the imaginary scenario, let’s pretend that a developer comes along and makes an erroneous change to the business logic of the method. The naughty developer replaces the “<=” with a “<”.

def multiplyOrDivide(a: Int, b: Int): Int = {
 if (a < b) // replaced <= with <
   return a*b
 else
   return a/b
}

While this is a contrived and simplistic example, it represents the reality of software regression in the future. In this example, the tests will still pass even after the developer makes the erroneous change. Mutation testing will protect the software from future regression, by highlighting the fact that there are missing test cases or assertions.

Mutation Testing Tools

There are many different mutation testing tools available for various languages. The following are some of the well-known mutation testing tools:

Stryker-Mutator — A tool that may be used for Scala, Javascript and C# mutation testing. This is the tool currently being used by the first Engineering Division at takealot.com to standardise mutation testing and is the tool used for demo purposes in this article.
pitest — A tool that may be used for Java applications. Arguably the most mature of all the tools.
cosmic-ray — A tool that may be used for Python Applications
mutmut — A tool that may be used for Python Applications
Infection — A tool that may be used for PHP Applications

These libraries, and in this case Stryker Mutator, will report on surviving mutants. Stryker Mutator provides a nice coverage report showing exactly which line of code failed the mutation test and which mutant survived. The image below is a small screenshot of this report.

It clearly shows the line of code that was not sufficiently tested and which mutant survived, i.e. a <= b was mutated to a < b. This mutation should have caused a failure in at least one test but it did not. Adding the test below would help.

"return the multiple value when a == b" in {
      StrykerDemo.multiplyOrDivide(2,2) mustBe 4
}

After adding the above test case to the existing two, and re-running the mutation test, Stryker reports a 100% mutation score.

Stryker Mutator supports various types of mutators which are listed in their documentation, some of which may be excluded through configuration. This particular method has 5 mutants applied, 2 Boolean Literal mutants and 3 Equality Operator mutants. Below is the decompiled mutated code for this method (Note the impacts of compiling and decompiling on the actual code).

The above snippet shows that mutant 8 through 12 were applied to the code. The tests will be executed for every active mutant. It was mutant 10 that “survived” because it did not cause a failure in the tests before the extra unit test was added. The extra test asserts the scenario where parameters are equal and so mutant 10 would now be “killed”.

Different types of mutators

In general, the most common form of mutators are as below.

String Literal — Take a string of characters and make it a different string of characters.
Boolean Literal — Invert any explicit boolean values.
Arithmetic Operator Changes — Find a mathematical function such as an addition and make it a subtraction for example. This particular mutant is not supported by Stryker Mutator for Scala.
Logical Gate Reversals — Reverse logic gate checks, for example, == to !=, or, > to <
Iteration Limitator Removals — Find a variable iterator limit and artificially increase or decrease it.
Null Injections — Find a place where a null can be injected and do it.

The above is not an exhaustive list. Different mutation testing tools will have their own types of mutators, but Stryker Mutator, the tool used for the demo above lists theirs here. It is important to understand these mutators as part of adopting mutation testing as you might wish to exclude a certain type of mutator for certain services. For example, it may be useful to exclude String Literal mutants if a service contains many String Literals that do not affect business logic.

Stryking

As mentioned before, Stryker Mutator was the mutation testing tool of choice. This is because most of the services in the Engineering Division at takealot.com that initially trialled mutation testing, are developed in Scala. Stryker4s is Stryker Mutators Scala library and easily integrates with the Scala build tool. The steps to perform mutation testing are quite simple. It involves:

Adding the stryker plugin to sbt (Scala build tool).

addSbtPlugin("io.stryker-mutator" % "sbt-stryker4s" % "0.14.3")

Adding a stryker4s configuration file to the root directory of the project.

stryker4s {
  mutate: [ "core/app/**.scala", "!dist/**"...]
  reporters: ["console", "html"]
  excluded-mutations: ["StringLiteral"]
}

Executing stryker via sbt by running the following command.

sbt stryker

Note that if the Scala project is a multi-project service, the particular project would need to be referenced when executing stryker, as below.

sbt "project <project-name>;" stryker

Mutation Testing for an Entire Software Division

Running mutation testing on a single service is one thing but using it across an entire software division requires more planning and effort. Here are some learnings that were arrived at through the process.

Find the right framework — The first step is to find a framework that is compatible with the current language supported by engineering.
Create a small proof of concept — Start with a proof of concept locally and only after this is shown to work, expand it to a broad variety of services in your architecture.
Make use of exclusions — Exclude mutators that do not add value. An initial mistake made during the team’s adoption of Stryker Mutator is that all exclusions were avoided. This led to many false negatives in the reports which discouraged adoption. Once the right exclusions were added more actionable feedback was received from the reports.
Configure a Reporting Pipeline — The teams ensured that the results of the mutation test job runs are published to Google BigQuery, and the results are made available on a Looker Dashboard, along with other Unit Testing metrics. Additionally, implementers should seek to distribute results (such as the html reports) to developers as easily as possible. Ideally, a unified hosting solution should be used but failing that we found specific team slack notification channels work.
Iterate — Focus on service adoption first and then on mutation score. Aiming for 100% mutation coverage will delay the rollout.

Mutation testing is a computationally intensive process. This is due to running the same test suite multiple times and mutating code in memory. Frameworks do optimise for this using concurrency, and multiple test runners, but even with this, mutation testing can take a long time to execute, especially for large services. As a result, you should take care when adding mutation testing to your build pipelines. As part of our adoption, the following standards were implemented:

Don’t add mutation testing at the Pull Request level; This will take too long and frustrate developers looking to get their code into staging or production. However, developers should be encouraged to run mutation testing locally and explore optimisation as part of their normal workflow.
Run mutation tests on a scheduled basis as opposed to on every commit or pull request. Our mutation testing jobs trigger after midnight. After executing, it will slack message a result and a report to a team notification group and publish results to our reporting tool.
Only run tests with services which have changes in them — Our daily job checks for commits to the service’s development branch since the last successful mutation job run and will only run a mutation testing job if it did find that changes have been made.
Mutation testing of any large monoliths should run as a separate job.

Conclusion

Unit testing is critical for production grade software and provides many benefits to developers and software teams. However, most if not all of these benefits are predicated on the fact that the unit tests are well written. Developers write unit tests, and so like bugs in software, they often miss critical assertions or even whole tests (developers are human after all!). Issues with tests should be picked up in Pull Request reviews, but it would be foolhardy to believe that developers will find every issue in large scale systems.

Mutation testing provides an automated mechanism to establish the quality of tests and highlight issues for fixing. It is not a novel concept and has been around for many years but despite this many developers are not aware of it or of its benefits. Takealot’s Engineering Division has definitely benefited from the rollout of mutation testing across the teams’ services, and it has already helped to discover a number of critical missing tests.

Thank You

Adding mutation testing with Stryker Mutator was incredibly easy. A few config changes are all that are needed. This is made possible by a very supportive and active team supporting the tool. And so I would like to give a huge shout-out to the Stryker Mutator team and specifically, Hugo van Rijswijk, for being a huge help throughout the process.

Mutants to the Rescue

The Code Coverage “Deception”

Quality Over Quantity

We’re Only Human

Mutants to the Rescue

A Mutant Example

Mutation Testing Tools

Different types of mutators

Stryking

Mutation Testing for an Entire Software Division

Conclusion

Thank You

Further Reading

Written by Rowan Pillay