Mutation Testing in .NET: An Experiment

Published in

Compare the Market

7 min readSep 28, 2017

Update 10th September 2018: a lot’s changed since I wrote this post. I’ve now open sourced my own mutation testing tool for C#: Fettle (read about it in part two), and the team from Stryker have also written their own tool for .NET.

A while ago my team at comparethemarket.com were looking to add some features to a C# project where correctness was particularly important. Despite being written in a TDD fashion with high levels of code-coverage, I thought it would be a great opportunity to use mutation testing to get an insight into the quality of its tests.

If you’re not familiar with mutation testing, you can read what I’ve written about the subject previously.

But there was a problem. C# (and .NET in general) is severely lacking in mutation testing tools. There have been some valiant attempts but they seem to be unfinished and/or unmaintained. And the ones I found didn’t seem to be compatible with the version of .NET we use, or just plain didn’t work.

So I decided to have a go at writing my own.

Rather than making a generic tool, I focused on the minimum I needed. This was something that could mutate the particular C# project I was interested in (a .NET 4.5.2 project that had NUnit 3 tests).

Mutating

First off I had to decide how to implement the mutating. As far as I could tell I had two options:

Mutating the source code
The Roslyn API allows you to parse C# code into a data structure called a syntax tree. Once code is in syntax trees, you’re able to inspect and modify its behaviour before compiling it.
Mutating the built assemblies
The Cecil library allows you to modify compiled .NET assemblies. It does this by offering a view on the CIL byte-code that CLI languages such as C# are compiled into.

I was able to write proof-of-concepts that mutated a very simple program using both approaches. After a bit of thought, it was the second approach (mutating assemblies via their CIL) that I went with. This was mainly because of its apparent simplicity. For example, it seemed that an if statement could take a few different forms when expressed in C# code but would end up the same when expressed in CIL (I would later learn that things were not that straightforward).

The overall flow would be (in pseudo-code):

for every instruction that can be mutated:  for every appropriate type of mutation:    mutate the instruction
    write the mutated assembly to disk
    run tests (via the NUnit 3 console runner)    if no tests failed:
      report that a mutant survived

Tests First

After the proof-of-concepts were complete it was time to get cracking. I chose to use F# . This was for two main reasons:

Although I’d used F# for small utilities, katas etc. I was keen to use it for a bigger project.
I wanted to experiment with the more expressive ways of writing tests that F# allows.

The starting point was the first end-to-end NUnit test:

[<Test>]
let ``Some surviving mutants`` () =  Given "the app under test's code will produce surviving mutants"
    (fun () -> ValidInput |> WithSurvivingMutants)    |> When "mutating the assembly"
      (fun parameters -> mutateAssembly parameters)    |> Then "the result contains all the surviving mutants expected"
      (fun output ->
          match output with
          | Invalid _ -> Assert.Fail(...)
          | Results r -> r |> Seq.length |> should equal 3;
            output
      )

The test uses a Behaviour-Driven style set of Given/When/Then functions that are nice and simple to create using F# (example here). If you’re not familiar with F#, the general scenario the test describes is:

Given the app under test's code will produce surviving mutants
When mutating the assembly
Then the result contains all the surviving mutants expected

The test relies on a dummy C# app:

public static class ExampleMaths
{
   public static int Sum(int a, int b)
   {
      return a + b;
   }   // ...
}

…and some dummy unit tests. The tests are deliberately flawed, in such a way that they don’t fail when the app is mutated, thus generating surviving mutants. For example:

[Test]
public void ExampleMaths_test_with_no_assertions()
{
    // Mutants generated in ExampleMaths.Sum() survive.
    // That's because there are no assertions, and so
    // nothing changes when the implementation of Sum
    // changes.
    ExampleMaths.Sum(1, 2);
}// ...

The First Mutation

Armed with an approach for testing I began to build up the functionality. A key part of this was how to mutate the code. For example, if we take the method from earlier…

public static int Sum(int a, int b)
{
   return a + b;
}

…a simple mutation the tool needed to support was changing the + to a -. I started by using the handy ILSpy tool to look at the byte-code that the compiler generated:

IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: add
IL_0004: stloc.0
IL_0005: br.s IL_0007
IL_0007: ldloc.0
IL_0008: ret

If you’re not familiar with CIL then you can read about it here, but it’s not essential for you to know the details. The important bits to notice are these three instructions:

IL_0001: ldarg.0  // Push the first method arg ("a") onto the stack   
IL_0002: ldarg.1  // Push the second method arg ("b") onto the stack
IL_0003: add      // Add the two values from the stack together

The key is to replace the add instruction with one that subtracts: sub.

IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: sub       // <-- replaces add

Cecil makes this reasonably straight-forward. You can read an assembly from disk into a ModuleDefinition. A ModuleDefinition contains a collection of classes (TypeDefinition) each of which have a collection of methods (MethodDefinition). Once you’ve found the right method, you can then look within its body (MethodDefinition.Body) for the appropriate instruction (Cil.Instruction) to mutate.

Changing the add to a sub is a case of changing the instruction’s OpCode property, then saving the mutated assembly to disk:

let newOpCode = Mono.Cecil.Cil.OpCodes.Sub
instruction.OpCode <- newOpCode 
   
moduleDefinition.Write(pathToAssembly)

A passing test confirmed that this worked. But to double check I decompiled the mutated assembly back into C# (something else ILSpy lets you do). Lo and behold, just changing that instruction did mutate the C# + into a -

public static int Sum(int a, int b)
{
  return a - b;
}

Something Useful

It turns out that this kind of simple replacement of instructions worked for quite a few types of mutation, such as:

Other mathematical operations (*, /, %)
Booleans (true , false)
Equality: (== ,!=)
Conditionals: (&& , ||)

Other types of mutations were a little more complicated. For example, swapping > for a >= involved adding a couple of instructions as well.

Soon I had something I could run against a real project. When I did, it had some surviving mutants, which indicated gaps in the tests:

✗ 20 mutants survived:...

Hooray!

But…

I started investigating the surviving mutants with the aim of making the relevant tests better or adding new ones. But some of them didn’t make sense.

It seems these results came about because at compile time, parts of your C# code can be extended or modified by the compiler before being finally converted to CIL. For example, this…

string result = "";
switch (input)
{
  case "a": result = "F"; break;
  case "b": result = "G"; break;
  case "c": result = "H"; break;
  ...
  case "j": result = "O"; break;
}

…can be modified by the compiler into something like this (which I believe is an optimisation):

string result = "";
uint num = <PrivateImplementationDetails>.ComputeStringHash(input);
if (num <= 3826002220u)
{
  if (num <= 3775669363u)
  {
    if (num != 3758891744u)
    {
      if (num == 3775669363u)
      {
        if (input == "d")
        {
          result = "I";
        }
      }
   }
   else if (input == "e")
   {
     ...

This then confuses the mutation testing tool, which treats what the developer intended to be a switch statement as a series of ifs. When it finds a surviving mutation, it presents the result in a way that’s confusing if you only know the original code:

Surviving mutants:DummyType.DummyMethod, mutatation: "if" changed to "not if" (DummyMethod.cs line 12)

There were other cases where code was changed, or even added, by the compiler. These included how Dispose methods and async calls are handled.

The End?

I tried to make the code handle these situations in a smarter way but it became increasingly difficult. It seemed that I wasn’t going to fix all the issues satisfactorily without a lot of hard work.

There was a fundamental problem with my entire approach: I was relying on the implementation details of the compiler. This meant:

I would have to understand the compiler’s behaviour in detail to implement the tool correctly.
Changes to the compiler’s implementation in new versions could break the tool.

Although choosing to mutate via the CIL byte-code was easy to start with, it was now starting to look very complicated to make a fully robust tool.

The End of the Beginning

So I‘ve ended up with a tool that can find some gaps in tests for a particular type of C# project. It has some false positives and tricky-to-diagnose output sometimes so it’s not ready for wider use, but I find it useful.

I also had a lot of fun learning about CIL, the C# compiler and F# development.

But what about a fully-fledged, robust tool that others could use? Well it may make sense to go down the approach of modifying the source code directly via Roslyn (the option I discounted originally). Relying on Roslyn (a documented, maintained API) and C# (again, well documented) sounds better to me now than relying on how a compiler behaves. Supporting dotnet core would also make sense for the longer term.

There’s also the potential to make the tool much faster. It currently runs all tests for every mutation. Although it stops when it finds the first failure, the overall process still takes many minutes for a relatively small project. Instead, it could run only the subset of tests that cover a particular mutated instruction. This would require some sort of test-coverage analysis to be done on startup.

I doubt whether I’ll have the time or energy to make all of these changes any time soon. But in a final twist to the tale, as I was coming to the end of my work on the tool I saw this tweet from Seb Rose:

So maybe a modern mutation testing tool for .NET is on the cards after all? Let’s hope so!