On Writing Tests as an Induction Proof of System Correctness

Introduction

In my career I have witnessed (and personally contributed) many badly written unit tests. Many developers write great unit tests and there are thousands of online resources that provide guidance on how to structure great unit tests. However, in my experience, developers often misunderstand exactly what to test in a unit test. The line between unit and integration test is often very blurred.

As we continue to move into a world of continuous deployment and strict automation unit testing discipline is becoming more and more necessary for business success. The days of half hearted unit testing as a change detection mechanism are over. Developers today and tomorrow will need to be experts in crafting tests that minimally and effectively verify a system is functional, performant and visually appealing. The second two categories are a much bigger discussion and tend to be difficult to verify quickly (and accurately) so I will leave them for a future article. So — how can we verify a system is fully functional with the fewest tests possible?

Some argue that the only way to properly test a system is to actually run the system and assert inputs and outputs against the running system. This style of test is commonly known as Integration Testing and is generally the slowest form of testing. Integration is a critical component of overall system verification — as I will describe further on. So — the question remains — how should one craft a balance of unit and integration tests to maximize test coverage of functionality while at the same time putting the absolute minimal effort into writing tests.

I myself struggle with this problem — more so in the past. At some point in my career I suddenly had a revelation. I realized that unit testing (good unit testing) is a sort of philosophical induction proof of a system’s correctness. This definition may seem somewhat abstract at first sight but I have successfully coached many senior developers using this analogy and it seems to resonate well.

This article is really an attempt to write this thought process down and share it more widely in the hopes that others find it useful.

Unit Testing vs Integration Testing

Before we dive into the specifics of my argument we need to nail down a few terms. There is a strong philosophical difference between the idea of unit testing and integration testing. Let’s start with defining what unit testing refers to in this article:

A unit test is an automated piece of code that invokes a unit of work in the system and then checks a single assumption about the behavior of that unit of work. [1]

The take home message from this definition: A unit test only tests a single functional “unit of work” at a time. This means when we test a unit in the system (say Unit A) that depends on another unit (Unit B) we must test A in such a way as to completely remove B from the tests functional assertions. This is typically done via mock or stub testing techniques where the dependencies of Unit A are mocked out and thus can have their behaviour simulated. If Unit A is tested with a concrete implementation of unit B we are no longer testing A in isolation and thus are really testing the integration of A and B together. There are thousands of resources on the internet that detail many great guidelines for writing good unit tests that are easy to debug and engineer so I will not go into detail around how to actually write unit test. Suffice to say — for the purpose of this argument, unit tests must strive for maximum reproducibility which means they attempt to remove as many sources of entropy as possible:

  • Use of Mock objects for dependency units of work.
  • All logic and verification run in a single thread of execution.
  • They are whitebox by nature: meaning every logical operation in the unit has an associated verification to ensure that operation is behaving as expected.

So — what is an integration test? Integration testing is basically any test that runs with multiple units integrated together without the use of mocks or stubs. This is a rather weak definition and there are tons of edge cases that will violate the statement above but for the purpose of this article it doesn’t matter. Tests that verify functionality are really either of the unit or integration variety. The question that matters is really — how do we structure tests suites as a balance of unit and integration tests that all work together to verify system behavior in a directed and thoughtful manner than simply writing unit tests and integration tests without thought to how they relate.

As a side note — the ad hoc nature of many unit / integration test suites are generally further exacerbated by organizational lines in many development organizations. Integration tests are generally the responsibility of a QA team (with a different management structure) while unit tests are generally the responsibility of the development team. This separation leads to broken communication lines that result in testing overlap and gaps.

What is an Induction Proof?

A good definition of induction proof is

“A method of proving mathematical results based on the principle of mathematical induction: An assertion A(x), depending on a natural number x, is regarded as proved if A(1) has been proved and if for any natural number n the assumption that A(n) is true implies that A(n+1) is also true.” [2]

In other words, an induction proof is a proofing mechanism that allows for starting with what is considered proven and attempting to extend that logic to other numbers in the sequence. Induction proofs are mostly associated with natural number sequences but can also be used with trees.

Induction proof have two major “steps”: The “Induction Step” and the “Base Case”. The base case is concrete. It generally proves the theorem about a single number (or node) in the sequence. Using the base case — the induction step extends that proof to other numbers in the sequence (or tree).

Let’s consider a trivially simple system that is composed of three components arranged as follows:

If we consider this system as a tree of components (with dependencies to other components) than we can construct a proof (not a mathematical proof obviously) along the lines of:

“Assuming A only uses components that are verified individually (B and C) and also that A uses B and C in the correct manner and finally as long as any of A’s internal logic is properly verified in isolation of B and C than we can say with certainty that A is properly verified”.

More generically — assuming a component with no dependencies works in isolation (the base case) and that any additional components only rely on previously verified components (the induction step) than we have a strong argument for suggesting the system is verified without requiring a full end to end integration test of every single feature and combination of input.

This logic has four predicates:

  1. Components B and C are verified
  2. Component A only has direct dependencies on B and C (which are verified)
  3. Component A is verifiably using components B and C correctly
  4. Component A has verified internal logic (verified outputs for given inputs).

In a sense (loosely), predicates #1 and #2 form the induction step while predicates #3 and #4 form the base case.

How Does This Relate to Testing?

I propose that predicates #2 and #4 can be verified by unit tests while predicate #3 can be expressed as integration tests.

Let’s start with predicate #4 (the first base case)

Component A has verified internal logic

This predicate is basically stating that component (or unit) A has a deterministic function between inputs and outputs. In other words, “for a given input, A will yield a consistent output that can be verified for accuracy.” This is the type of input to expected output assertion that unit tests are well suited for. As you will see, our future induction steps will depend on the “safety” that any dependency has been verified already and thus is “safe to use”. This also means that the usage of any dependency components is also verified.

In the simplest case, where component A has no dependencies and thus can be trivially verified in isolation, this predicate clearly represents a pure base case. In the more complex case where a component A is performing internal logic that depends on features from B and C there is a mixture of base case and induction (the internal logic not depending on B and C is still the base case while the logic that integrates B and C into A’s higher level purpose is being verified as an induction over B and C).

The tests described above are best written as unit tests.

The second base case is predicate #3:

Component A is verifiably using components B and C correctly

This really implies two things:

  1. That the behaviour of B and C is well understood and,
  2. The behaviour of B and C can be verified in some manner.

We can only assume we understand the behavior of internal components (well within the control of the development team to understand and fix). Third party components do not fall into this assumption.

In order to trust third party components we must wrap the functionality in an internal interface contract and then verify the behavior of that internal implementation. This pattern isolates the usage of the third party component into a well documented single unit which assists in verifying the component against its known internal usage. Once a third party component has been isolated, and its usage verified, it may be added to the “trusted” set of dependencies and can used in the induction step.

I would argue that all the tests suggested above are actually integration tests. They create in internal interface contract, wire a simple implementation to a third party component and then verify the inputs and outputs of that wrapped component. It is not appropriate to mock out the third party component as you are no longer actually verifying that you understand how to use it. Rather you are verifying that you are using a component as you assume you should use it (which is not the same thing). This verification of leaf dependencies using targeted integration tests also represent a base case argument.

Predicate #1 is purely inductive:

Components B and C are verified

I propose that this logical predicate can be expressed as “B and C are tested elsewhere thus can be assumed to be verified already” This also applies to third-party components (though the degree of trust must be considered” If the third party components testing is in question than a good internal suite of verification tests must be created to satisfy this predicate for any component that will use the third party component (more on this later). If we assume this predicate is true (based on testing diligence) than we are free to write unit tests that mock verified interfaces

Predicate #2 is a little more tricky:

Component A only has direct dependencies on B and C

This is really an inductive extension of predicate #1. In other words: if unit A’s dependencies are all verified in isolation and are known to behave correctly and A is only using B and C (plus some internal logic which is also verified) than A’s behavior is also verified.

Let’s take a second for that to settle in… This is the core leap of faith that leads to order of magnitude testing performance improvements. Predicates #1 and #2 give development teams licence to test components in isolation and inductively build up a tree of verification (as opposed to permutation testing the entire system from a high level).

Next Steps

This article is already too long and thus I will post a follow (eventually) that covers the following additional topics

  1. A proper example of how to test the induction step and a few common “base cases”. I will wait for comments to decide which examples to create.
  2. A discussion of “unit test coverage” and how (given the above) unit test coverage is a meaningful metric for quality (and how integration test coverage is not).

References

[1] http://artofunittesting.com/definition-of-a-unit-test

[2]Hazewinkel, Michiel, ed. (2001), “Mathematical induction”, Encyclopedia of Mathematics, Springer,ISBN 978–1–55608–010–4