Sociable or solitary unit tests — choose your tradeoffs
What to choose as a unit for a test? Should it focus on a single module (whatever it is in your programming language) and its narrow responsibilities, or, maybe on a group of collaborating modules? Should my test be sociable and include System Under Test (SUT) dependencies, the dependencies of the dependencies, etc., or should it be solitary, mocking the closest neighbors? A corollary question is if and when to introduce test doubles—never, sometimes, without any restrictions?
I never saw in those discussions that somebody put in debate the complexity of the tested production code and the quality of the tests — whether they cover all inputs and check every execution path. In this article, I’m going to present the advantages and disadvantages of sociable and solitary unit tests taking into account the SUT complexity and the test maintenance effort.
Going straight to the conclusion (and hoping that you’ll follow along after reading it):
It doesn’t matter if how big or small is your unit (or if you mock or go real)— you’re equally doomed.
Removing the sensationalism in the claim above: you have to choose carefully your tradeoffs — what you need to sacrifice to get particular benefits of each type of test.
Sociable vs solitary tests
When you go for sociable unit tests (or you follow the classist TDD school), your tests cover broader behaviors. Test double introduction is discouraged and they are used at the boundaries of your domain/system or to reduce test smells (duplication of test code, obscure test).
Solitary (or isolated) unit tests check smaller chunks of behavior (1) focusing on a specific computation or orchestration of collaborators. They use test doubles to isolate dependencies and focus only on concrete behavior. They rely on the chained contract principle — mocks define contracts that are fulfilled and verified in tests of collaborators.
What should I choose?
If you choose the sociable approach
- You get broader scope tests closer to acceptance tests. The broader the scope, the closer gets the included behavior to the full behavior perceived by the customer.
- You can make larger refactors — your SUT is bigger and covers more modules.
- You can test emergent behaviors coming from the more extensive module structure. Emergence is the appearance of new behaviors in a complex system due to interactions between its constituent parts or its special arrangement.
- You have to define less (internal) contracts between modules using mocks. There is less chance to break them or introduce a mismatch between them that the tests won’t catch.
If you choose the solitary approach
- You get test feedback on your micro-design. Tests can show you clearly whether the module has too many responsibilities or awkward collaboration with its peers.
- Tests are good documentation of particular modules. You can reason clearly what the module does: how specific inputs produce particular outputs, which queries and data you get from our collaborators and what side effects are triggered (for the explanation of what are direct and indirect inputs and outputs, check my article What is the System Under Test? A tale from Gallic Wars).
- Fixtures are small and easy to maintain. They are test-specific, minimal, and don’t need to be shared. Most probably you won’t need any sophisticated test helper methods or patterns like Shared Fixture, Object Mother, or test-specific Builders. You never get to the point where the fixture maintenance, or just figuring out what needs to be changed to fix the test or to verify new conditions takes much more time than developing the production code.
- Test failures are easy to pinpoint. If there is a broken assertion, you should be able to find the offending spot in the production code quickly.
- You will have much less repetition and duplication in tests. You just verify the desired behavior directly. In the case of sociable tests, to reach that behavior, you may have to repeat the same setup or verification (aka create the same test environment) again and again. Or you sweep the setup complexity under the rug encapsulating it in utility methods and external fixtures. As an alternative, you can cover that specific behavior (or some edge cases) with isolated tests (in addition to broader-scope tests) with all disadvantages of duplication. Both tests (sociable and isolated) need to be maintained, and, if they both fail due to the same fault, it’s harder for tests maintainers (us!) to locate the defect.
Choose your own evil (or good)
You certainly noticed that the virtues of one approach are the vices of the other one. I visualized the tradeoffs on the picture below as a relationship between System Under Test complexity covered by a single test and the effort required to maintain that test:
The horizontal axis represents the System Under Test complexity covered by a single test. I’m thinking about complexity in terms of cyclomatic complexity — the number of independent paths through the code, boundary values and equivalence classes in the test inputs, and combinations of predicate inputs in conditional expressions and branching statements. Basically, the more code paths and logical conditions to cover, the more tests cases you’ll need to get decent confidence in your production code.
The vertical axis represents the test maintenance effort — thinking and keyboard time spent on understanding the test and adjusting it to actuating forces (especially adapting test fixtures — everything you need to execute the SUT and verify test outcomes).
There are two main forces that move you to change a test:
- refactoring/redesign of the internal structure of a system
- observable behavior changes (adding or modifying features)
If a single test checks a low-complexity SUT, and you perform a high-complexity refactor, then probably you’ll have to change many single test cases. On the other hand, changing SUT behavior (with a properly designed system) will require a minor surgery in few tests.
In the case of a single test covering a high-complexity SUT, broad refactors won’t modify much of the test code. Behavior changes of the SUT will cause complex adjustment of test cases. Assuming that your tests cover all execution paths and condition outcomes, you’ll have a sheer amount of test cases and supporting test utility methods (e.g. encapsulating fixture creation or result verification).
As I wrote in the previous section, there is no one-size-fits-all approach. Your choice may depend on one’s preferences or the context. Personally, I tend to go the solitary way and test smaller behaviors introducing test doubles either to reduce test complexity and repetition or to design new collaborators. However, when faced with poorly designed or tested code (i.e. the logic I don’t grasp), I lean towards sociable tests that help me to understand what the underlying modules are doing and make intervention possible — mostly a refactor.
(1) I define the behavior as a computation transforming input domain to output codomain and performing some side effects