Notes on Unit Testing and Other Things

People often ask me what constitutes a good amount of unit testing. I think the answer is usually a very high level of coverage but it’s fair to say that there are smart people that disagree. The truth is probably that there is not one answer to this question. Notwithstanding this, I have been able to give some pretty consistent guidance with regard to unit testing which I’d like to share. And not surprisingly it doesn’t prescribe a specific level of testing.

No coverage is never acceptable

While it’s true that some things are harder to test than others I think it’s fair to say that 0% coverage is universally wrong. So the minimum thing I can recommend is everything have at least some coverage. This helps in a number of ways not the least of which is that each subsequent test is much easier to add.

Corollary: “My code can’t be unit tested” is never acceptable

While you may chose to test more lightly (see below) it is always possible to unit test code; there are no exceptions. No matter how crazy the context, it can be mocked. No matter how entangled the global dependencies, they can be faked. The greater the mess, the more valuable it will be to disentangle the problems to create something testable. Refactoring code to enhance its testability is inherently valuable and getting the tests to “somewhere good” is a great way to quantify the value of refactoring that might otherwise go unnoticed.

Unit testing drives good developer behavior overall.

Unit Tests Cross Check Complex Intent

The very best you can do with a unit test, in fact the only thing you can do, is verify that the code does what you intended it to do. This is not the same as verifying correctness but it goes a long way.

  • you create suitable mocks to create any simulated situation you need
  • you observe the side-effects on your mocks to validate that your code is taking the correct actions
  • you observe the publicly visible state of your system to ensure that it is moving correctly from one valid state to the next valid state
  • you add test-only methods as needed (if needed) to expose properties that need testing but are otherwise not readily visible with the public contract (it’s better if you do this minimally, but for instance test constructors are a common use case)

Given that unit tests are good for the above you then have to consider “Where will those things help?”

When creating telemetry you can observe that you log the correct things the correct number of times.

  • shipping logging that is wrong is super common and it can cost days or weeks to find out, fix it, and get the right data…
  • it doesn’t take weeks to unit test logging thoroughly, so great candidate to move fast by testing
  • logged data that is “mostly correct” is likely to become the bane of your existence

When you have algorithms with complex internal state, the state can be readily verified.

  • the tests act as living documentation for the state transitions
  • they protect you against future mistakes
  • well meaning newcomers and others looking to remove dead code and/or refactor can do so with confidence
  • if you’re planning such a refactor, “tests first, refactor second” is a great strategy

When you have algorithms with important policy, that policy can be verified.

  • many systems require “housekeeping” like expiration, deletion, or other maintenance
  • unit tests can help you trigger those events even if they normally take days, weeks, years

When you have outlying but important success/failure cases, they can be verified.

  • similar to the above the most exotic cases can be tested
  • important failure cases where we need to purge the cache or so some other cleanup action that are not exactly policy but are essential for correctness can be tested
  • these cases are hard to force in end to end tests and easily overlooked in manual testing
  • in general, situations that are not on the main path but are essential are the most important to verify

Areas under heavy churn can have their chief modes of operation verified.

  • anywhere there is significant development there is the greatest chance of bugs
  • whatever mistakes developers tend to make, write tests that will find them and stop them
  • even if developer check-ins are 99% right that means in any given week something important is gonna bust, that’s the math of it…

In case of complex threading, interleaves can be verified.

  • By mocking mutex, critical section, or events, every possible interleave can be simulated
  • weakness: you can only test the interleaves you thought of (but that goes a long way)
  • that’s actually the universal weakness of unit tests

The above are just examples, the idea being that you use the strength areas of unit tests cross-checked against your code to maximize their value. Even very simple tests that do stuff like “make sure the null cases are handled correctly” are super helpful because it’s easy to get stuff wrong and some edge case might not run in manual testing. These are real problems that cause breakage in your continuous integration and customer issues in production. Super dumb unit tests can stop many of these problems in their tracks.

Once you have some coverage it’s easy to look at the coverage reports and then decide where you should invest more and when. This stuff is great for learning the code and getting people up to speed!

A few notes on some of the other types of testing might be valuable too.

When to Consider Integration Tests

If you can take an entire subsystem and run it largely standalone, or wired out for logging, or anything at all like that really, it creates a great opportunity for what I’ll loosely call “integration testing.” I’m not sure that term is super-well defined actually, but the idea is that more live components can be tested. For instance, maybe you can use real backend servers, or developer servers, and maybe drive your client libraries without actually launching the real UI — that would be a great kind of integration test. Maybe this is done with test users and a stub UI; maybe it’s some light automation; maybe a combination. These test configurations can be used to validate large swaths of code, including communication stacks and server reconfigurations.

It’s important to note that no amount of unit testing can ever tell you if your new server topology is going to work. You can certainly verify that your deployment is what you think it is with something kind of like a unit test (I can validate my deployment files at least) but that isn’t really the same thing.

Something less than your full stack, based on your real code, can go a long way to validating these things. Perhaps without having to implicate UI or other systems that add unnecessary complexity and/or failure modes to the test.

This level of testing can also be great for fuzzing, which is a great way to create valid inputs and/or stimulation that you didn’t think of in unit tests. When fuzzing finds failures, you may want to add new validations to your suite if it turns out you have a big hole.

When to Consider End to End Tests

Again, it would be a mistake to think that these tests have no place in a testing ecosystem. But like unit tests, it’s important to play to their strengths. Do not use them to validate internal algorithms; they’re horrible at that. They don’t give you instant API level failures near the point of failure, there’s complex logging and what not.

But consider this very normal tools example: “I am adopting these new linker flags for my main binary” — in that situation the only test that can be used is an end to end test. Probably none of the unit tests even build with those optimizations on, and if they did they would not be any good at validation anyway, the flags probably behave differently with mini-binaries. Whereas a basic end-to-end suite of “does it crash when I try these 10 essential things” is invaluable.

Likewise, performance, power, and efficiency tests are usually end to end — because the unit tests don’t tell us what we need to know.

Fuzzing may have to happen at this level, but it can be challenging to do all your fuzzing via the UI. Repro steps can get harder and harder as we go down this road as well. External failures that aren’t real problems are also more likely to crop up.

Conclusion

It’s no surprise that a blend of all of these is probably the healthiest thing. Getting to greater than 0% unit-test-coverage universally (i.e. for all classes) is a great goal, if only so that you are then ready to test whatever turns out to be needed.

Generally, people report that adding tests consistently finds useful bugs, and getting them out, while making code more testable, is good for the code base overall. As long as that continues to be the case on your team it’s probably prudent to keep investing in tests, but teams will have to consider all their options to decide how much testing to do and when.

One popular myth, that unit tests have diminishing returns, really should be banished. The thing about unit tests is that any given block of code is about as easy to test as any other. Once you have your mocks set up you can pretty much force any situation. The 100th block of code isn’t harder to reach than the 99th was and it’s just as likely to have bugs in it as any other. People who get high levels of coverage generally report things like “I was sure the last 2 blocks were gonna be a waste of time and then I looked and … the code was wrong.”

But even if returns aren’t diminishing that doesn’t mean they are free. And bugs are not created equal. So ultimately, the blend is up to you.

Like what you read? Give Rico Mariani a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.