PRACTICAL SHIFT-LEFT ON TESTING

Should You Unit Test?

Published in

CodeX

9 min readJun 25, 2021

Scratched ice — This post will only scratch the surface of a very large topic.

First, I would like to thank Vladimir Khorikov for his comments on a draft of this article. You would expect the author of Unit Testing Principles, Practices, and Patterns to disagree with the opinions expressed in this post, but Vlad and I very much agree about using larger units in unit tests, as they allow us to cover units of behaviour instead of units of code.

Is There a World Without False Positives?

For as long as I remember, at least when it comes to my professional life as a software developer, I have been obsessed with correctness.

So I have meticulously unit tested all my code.

For many years, I was baffled when I found myself in the middle of a battle between two groups of developers, those that believed in unit testing and those that did not.

How could that be? Correctness is good and testing is needed in order to ensure that code is correct, right? So how can that be a matter of believing in unit testing?

It slowly dawned on me when I started working in Saxo Bank a decade ago. On my first day, I pulled the source code for the trading platform and ran all the unit tests. Half of them failed. Why? What did I do wrong?

I probably did a lot of things wrong, but that was not it.

The reason for the failing tests was that the team had stopped focusing on unit tests. It did not help for long that another newcomer and I fixed the failing tests, because that did not fix the core problem.

Unit tests have never revealed any real bugs.

The core problem was false positives. The team had experienced again and again that unit tests failed when even minor changes were made to production code. Not because the tests revealed regressions in production code, but more often than not because changes in production code broke the many implicit assumptions baked into unit tests.

The consensus became that unit tests have never revealed any real bugs.

Naturally, I fought the anti-unit test junta. False positives is not a problem if you write proper unit tests, right? And if it is not possible to write proper unit tests, then it’s because you have poorly structured code. Which again means that the process of unit testing will drive you towards refactoring your code in order to improve its structure. So it’s obvious that unit testing not only leads to functionally correct code, it also leads to well-structured code. There are books written about this. What’s not to like?

In short, what’s not to like is that it will most often not work out like that.

Call it constraints in resources and processes or call it human behaviour. For some reason, several developers would either not maintain the freshly green suite of unit tests, or, when coerced, would happily write a hundred lines of setup code per test without inferring that a refactoring might be in place.

But worse, even when putting pressure on certain developers, so that unit tests were actually maintained, we still did not get confident that the code was correct. Testers told us that unit testing is not testing, they essentially ignored unit tests and started from scratch on real testing, whatever that was.

By now, you have probably guessed where I am going, possibly hinted by the title of this post.

But please don’t get me wrong, I very much like the benefits of unit testing. It’s fantastic to be able to run all the code in a way which gives me fast and deterministic feedback. I can run the tests locally, or run it as part of a PR build for each commit, as part of a master build, … . Anywhere, anytime. That’s unlike end-to-end tests or integration tests which require some kind of environment. I will come back to that.

If only the feedback from running tests could make me truly confident that my code is correct.

If only I could get confidence before I do my refactoring. That would make it so much easier to convince my peers to refactor in time — before the code rots.

Personally, I will do needed refactoring, scouts honour and all, but I need a safety net while doing it. OK, I admit, I may not always do all the refactoring that I ought to do, but I still want confidence and I still want to avoid false positives as much as possible.

Oh, and I want testers to feel the same degree of confidence. I want them to continue my testing, I want them to work with me, and I certainly do not want them to ignore my testing efforts. In fact, I want to cooperate with testers continuously, so that there is hardly any hand-over.

Is that too much ask for?

Maybe, but that’s pretty much our reality today.

I will explain shortly, but first a small detour.

Books on unit testing — the good one

I mentioned that there a books on unit testing, on how to write proper tests and how to structure your code so that it’s possible to write proper tests.

The best book on that topic I have read so far, is Vladimir Khorikov’s Unit Testing Principles, Practices, and Patterns. It’s eloquently written, it’s easily read, and once you are through it you will have a really good understanding of the topic.

That book also confirmed me in my opinion, to keep classic unit testing at a minimum. Well, actually not only the book itself, but especially one article in Vladimir’s newsletter on False positives vs. flaky tests.

In the book, Vladimir writes that,

End-to-end tests are immune to false positives.

A commenter writes,

I feel the opposite is true. End-to-end tests have unreliable external dependencies, involve test code with higher cognitive complexity, and are verified more unreliably.

Vladimir explains that his definition of false positives relates to one component of a good test, namely resistance to refactoring, whereas the flakiness that the commenter hints at is related to maintainability.

We want tests to be immune to false positives and to flakiness.

By the way, Vladimir’s other components of a good test are, protection against regressions — essentially, the safety net I mentioned earlier, and fast feedback— yep, we certainly want that as well.

What we need beyond unit testing

Isn’t it blatantly obvious that what we need is the best of the two worlds? We want our tests to be immune to false positives, like end-to-end tests, but without the flakiness.

This may sound like we want to have our cake and eat it. But don’t we all want that if we can? I say, yes, we can.

Let’s consider why end-to-end-tests are supposedly immune to false positives. That’s easy — the reason is that end-to-end tests are focused on testing externally exposed behaviour. Clearly, if a refactoring breaks existing tests, then it’s the explicit assumptions on required behaviour, which are checked by the tests, that are violated. This is the purpose of those tests.

I have so far conveniently ignored the inherent flakiness of end-to-end tests. End-to-end tests may fail due to unreliable external dependencies in a runtime environment. Also, I have ignored that testing in such environments can be really slow. We need to control the runtime environment along with all dependencies, so that our tests can be fast and deterministic.

Obviously, we can get rid of flakiness if we skip the runtime environment entirely and mock external dependencies. It’s obvious, because that’s what we have always done while unit testing.

In conclusion, we need to,

Focus on externally exposed functionality (so we are immune to false positives).
Control all external dependencies (so we don’t introduce flakiness).
Run tests in-memory (so we can have fast feedback and be independent of possibly unreliable environments).

It’s a corollary to the above that you can obtain functional coverage, not only code coverage (measured by lines or blocks of code), but that requires that you only mock truly external dependencies.

How to do all that, you ask? That’s easy peasy — read my previous post and you know how. That post is about doing it for ASP.NET Core web services, but the concept is the same for other units of software, only the technical details differ.

Note that I have so far only considered the lowest level zero of tests, not the top of a test pyramid, so to speak. My purpose of this post is to focus on this lowest level, but also to raise the bar — quite a lot of real testing can be done at this low level.

In later posts, I will write about higher levels of tests. Spoiler alert: Those tests will be conceptually and syntactically similar to level zero tests and will run (at least partly) out-of-process, but will most often have good control of external dependencies and environments — and still be fairly fast.

As always, the devil is in the details — and this post would be way too long if I delved into many of those details.

Here are a couple of examples of such details,

do I really suggest that an entire <insert a huge lump of (legacy) code here> should be the smallest unit to test? Yes, absolutely! If at all possible.
do I really suggest that testing must be targeting the functionality exposed by <insert the public interface, e.g. REST or gRPC>? Yes, as a rule of thumb. However, it might be beneficial to target a sub-system, but still keep the large unit size as illustrated below.

If A is a REST controller class, we prefer that our tests target A. But it could make sense to target sub-system C. Either way, keep the larger unit, and control it by your IoC container. For the lowest level zero of tests, either mock the database and all other external dependencies, or run them in-memory.

What’s next

If you have read my previous post, you have probably discovered a theme here.

In Should you unit-test API/MVC controllers in ASP.NET Core?, Andrew Lock argues that you should test a larger unit than just a controller class. This is essentially about not unit testing controller classes.
In Should You Unit-Test in ASP.NET Core? I argue that you should test an entire web service as a single unit. This is essentially about not unit testing units smaller than an entire web service.
In Should You Unit Test?, this post, I argue that you should generally test larger units. This is essentially about not unit testing, or at least to keep classic unit testing at a minimum.

But everything is not as it seems, despite that theme.

While I find it important to minimize traditional unit testing and instead focus on externally exposed functionality while you control external dependencies and environments, this is but a means to an end.

The end I am aiming for is my old obsession, to achieve correctness.

This requires quite a few things, e.g.

People with tester and developer roles, respectively, must work together continuously during software development.
This again requires that these people understand and talk the same language, i.e. that they refer to the same concepts when it comes to testing.

The general concepts are outlined in the paper A Practical Method for API Testing in the Context of Continuous Delivery and Behavior Driven Development, so I will not repeat them here (if you are not a member of IEEE you cannot access the link to the paper but you can see a recording from the conference here). Suffice to say that the thoughts behind BDD and combinatorial testing help a lot.

There is a lot more to say, especially about practicalities, but this post is already long enough, so let’s stop here for now.

Thank you for reading this far! Stay tuned, there’s more to come.