Unit Testing: Values and Principles

What’s a unit test? A function test? Can it interact with the outside? That isn’t very meaningful if we don’t start with why we do testing in the first place.

Published in

CodeX

10 min readMay 6, 2023

Why do we write tests? I heard sentences like “because we need to have coverage”, “if you have a public class, it needs a test” and “because it’s part of our methodology”. None of that addresses why we test things… and end up with a mechanical and performative approach to testing.

Defining ‘unit test’ upfront does more harm than good. People start working to appease the definition rather than the original goal (when a measure becomes a target, it ceases to be a good measure). These definitions can become handy for practical communication purposes. Still, they should come up at last, after alignment on the values and principles, so I propose we start with why we do testing. Identifying reasons can provide direction and purpose.

Values and principles

Automated testing is just a practice, so principles and values should guide it. Values represent our mental compass and what drives our principles, providing direction and purpose. Values are ultimately why we do things. By uncovering the values and principles of testing, we can end up with a better definition of the practice; when in doubt, we can always ask, “Why do we write automated tests?”. We end up with a better testing strategy. Questions like “What kind of test should I do?”, “Is it worth testing this?”, “Should we have contract testing?” relate to our values and principles.

Values and practices are an ocean apart. Values are universal. Ideally, my values as I work are exactly the same as my values in the rest of my life. Practices, however, are intensely situated. […] Bridging the gap between values and practices are principles. Principles are domain-specific guidelines for life. Extreme Programming Explained

**Values ➡ Principles ➡ Practices**
Practices should be derived from principles, which in turn should be derived from values [image source]

There are only two ways to influence human behavior: you can manipulate it or you can inspire it. Start with Why

What are the values behind unit testing? It depends on the context, but usually, it’s about quality, maintainability, feedback, and a safety net (being able to sleep well at night). We could ask why and explore them further; we would possibly talk about the joy of crafting quality software and making people’s lives better, but that’s not the goal here. What about principles? Here’s a typical list:

Automation: tests should be automated so they can be run repeatedly and efficiently, locally and in the CI/CD pipeline. Running the same suite by point and click is too tedious and error-prone.
Determinism: the same test case should always yield the same outcome.
Quick tests: the test suite should be fast, or people will skip it. The economy of time should be always in our minds.
Testable code: Write code that is easy to test and involves adhering to certain design principles and best practices. TDD is key here because tests are the first clients.
Early and frequent feedback: write early in the development process and frequently run to detect defects and bugs as early as possible.
Refactor confidently: make code changes without fear of breaking existing functionality.
Refactor smoothly: make changes to the code without tests being a burden. You shouldn’t spend most of your time updating tests, or they lose most of their value. It is critical not to depend on implementation details.
Integrate and deploy frequently: automated testing is an essential enabler of CI/CD, emphasizing frequent and automated testing, integration, and delivery of changes.
Documentation: tests provide clear examples of how to use the code in different scenarios. Tests serve as living documentation as they’re always up to date in opposition to code comments and documents.

These principles are closely related to the automated testing goals. It’s all about building a safety net and achieving a good measure of maintainability and evolvability.

It should be clear by now that the team’s definition of ‘unit test’ should come up as a consequence of the team’s principles. Each team may have different principles with different weights. The testing strategy should also be derived from the principles. It’s essential to have these conversations, or distinct values and principles will guide each team member.

The “unit is a function” myth

I often see codebases that rely on low-level unit tests to test everything; there are just a few high-level tests for happy cases. This emerges from the myth that unit tests must be low-level. It’s probably also because these tests are straightforward to write.

Unit tests are not function tests; otherwise, they would be called function tests (the same applies to classes). Due to the “unit test is a function test” myth, many codebases end up with only the top and bottom parts of the testing pyramid: just a few end-to-end tests that can’t possibly cover everything and dozens of fine-grained tests that are focused on details and create a false sense of safety through an elusive high test coverage.

You’re too dependent on implementation details if you do many micro/solitary tests. Tests become about technology rather than the domain. Refactoring becomes painful. The weight of thousands of tiny tests constantly drags you. You’re too exposed to changes. Instead, tests should fail due to domain reasons, not technical reasons. For example:

A test should fail not because you changed a CSS class name but because a button is not visible (your users don’t care about test/HTML identifiers; these are technical and volatile details).
A test should fail because a user was created but cannot log in, not because a mock was not called.

A function is a low-level detail that you should be able to refactor freely without constantly updating tests. Classes and even software layers are implementation details. Indeed, some components may need to be tested in isolation (e.g., because they’re reusable or have an intrinsic complexity), but that is the exception, not the rule.

Don’t test technical things like utilities — that’s an antipattern (even software layers can be seen as an implementation detail, which is why mocks are to be avoided). Units are behaviors of the domain (i.e., the problem the software solves) rather than technical aspects. Your tests should revolve around those units. Naming your test files after the unit they exercise (e.g., TestProfilePage.kt, test_upgrade_user.py, EnterEconomyMode.test.ts) can help to foster this mindset.

But not all unit testers use solitary unit tests. Indeed when xunit testing began in the 90’s we made no attempt to go solitary unless communicating with the collaborators was awkward (such as a remote credit card verification system). Unit Test

Code structure (e.g., a class referencing other classes) is an implementation detail.

Variable unit size

According to the Cambridge Dictionary, a unit is a single thing or a separate part of something larger. In software, a unit means an isolatable and cohesive part, not the tiniest piece of software. Each testing pyramid level represents a different unit size.

Tests can target units of varying sizes. The smaller ones are called solitary; at the other end, there are sociable.

Solitary: The unit is a lower-level concept if it has a stable interface, is potentially reusable, and has rich behavior that shouldn’t leak outside. For example, the unit may be an adapter, as you may want to test auto-retrying or caching. In a UI component library, the unit is the component (e.g., auto-complete dropdown, date range picker, filterable table).
Sociable: These can interact with databases or fake services under the same umbrella. In an app, the unit is a behavior: it represents a use case. Generally, bigger units represent a better strategy.

[A unit] is not necessarily a class or a method or a function, but it’s whatever YOU, as the developer writing the code and the test, decide it to be based on your design and your boundaries. And, of course, if after determining the depth of your test you still need some stubs or mocks underneath its boundaries, go for it. But let’s stop mocking and stubbing just because we can or think we have to. Mockists Are Dead. Long Live Classicists

I haven’t strictly defined what a unit test is because I don’t like the distinction between unit tests and other types of tests. There’s no unit testing, only automated testing, which displays a spectrum of abstraction levels working toward the testing goals.

Observable behavior

The unit is the system under test (or test subject). It’s a discrete (isolatable) part that provides value on its own by exhibiting a set of behaviors. You can isolate UI components, whole systems, or anything between, but it must exhibit discernible behavior. Unit tests should target that observable behavior regardless of how they’re implemented. It’s all about the usage standpoint.

A unit is about observable behavior (as its user), not implementation details like structure.

If you think about a unit managing a state, a behavior can be obtaining a part of that state or a way to manipulate it. This is known as the query-command separation, which applies to any unit size level, from a function to a system.

Unit testing is about the isolation of behaviors. It’s about capturing and exercising specific behaviors of a system from the outside in. That can be achieved through its interfaces (i.e., entry points).

While testing a website, the unit is a page and everything that supports it; the actions are clicks and keystrokes.
In a web API, the unit is each endpoint; the actions are the use cases.
In end-to-end web app testing, the unit is a user journey (although no one calls these tests ‘unit tests’, the same principles apply). On higher levels, the observable behaviors match the apps’ use cases as they describe functionalities in usage terms.

Indeed, the entry points may be functions, but that doesn’t make them the test subjects. They’re solely used as interfaces (entry points) for the actual behaviors, the real test subject. Many will be tested indirectly.

Make sure you always test as a user (an interface user) from the outside, knowing as little as possible about the implementation. Testing only using interfaces is a great way to ensure the decoupling of tests from implementation (e.g., don’t share constants between implementation and tests).

The more your tests resemble the way your software is used, the more confidence they can give you. (Kent Dodds)

Testing strategy

It follows from the principles and the elastic nature of the unit size that not everything must be tested the same way. For example, you may have lower-level testing when you want to do many combinations and variations. You may have higher-level testing (like smoke testing) for things on the critical path. There’s no universal testing strategy. The ideal depends on the nature of the system under test; it’s the one that best respects your principles.

Additionally, we could test everything as a user if all tests were fast and a good documentation source while providing pinpointing abilities. Trade-offs should happen strategically rather than by decree. The practical testing pyramid is a side-effect of those tradeoffs rather than an ultimate goal.

Let me leave you with some additional advice that can help you shape your testing strategy:

Don’t focus too much on guardrails like “it can’t contact the network”, “every class needs a test”, or “it must be under 50 milliseconds”. Instead, focus on why we do testing in the first place — when in doubt, revisit your principles. My mnemonic is “safety net, docs, ease of refactoring, pinpointing ability”.
Don’t waste time distinguishing unit testing from integration testing and end-to-end testing… The line dividing them is so thin and arbitrary that I prefer to ignore it. The team can develop definitions derived from their principles.
Tests should never feel like a burden. Feeling annoyed because you have to update tests or because they’re slow or brittle are clear signs that the testing strategy is flawed. You should feel tests are a blessing or you’re doing something wrong. Tests should always pay their cost.
Tests are not a goal. They’re just a supporting mechanism. They exist to assist us in delivering quality software. We need to find ways to get the most out of them while doing the least of them.
Beware of the fallacy “Tests are good; therefore, the more tests, the better”. Per test, you should do an informal cost/benefit analysis, asking if it’s worth it while guided by the agreed-upon principles. Those will also help define the unit's size under test and determine if the test is worth writing about in the first place. I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given confidence level. (Kent Beck)
Trying to come up with a testing strategy from the start to follow from there on is a waterfall mindset. We uncover it only by trial and error. We keep adjusting it as we find we’re deviating from the principles. Ten or twenty years from now we’ll likely have a more universal theory of which tests to write, which tests not to write, and how to tell the difference. In the meantime, experimentation seems in order. (Kent Beck)

Conclusion

Many people enjoy strict definitions. It’s all about feeling in control and repeating the same methods without questioning. However, trying to force fit things into boxes can be wasteful because they can’t always be organized so nicely. It can also be harmful because people write tests for their own sake, forgetting the original goals.

Tools, metrics, and practices should never be decided upfront; they should come from values and principles, so I propose you start there. For inspiration purposes, I still recommend reading some definitions. Tests should have some desirable universal properties: the test desiderata.

Writing tests to appease formal definitions may bring consistency and high coverage, but it doesn’t ensure a good safety net or code that is easy and enjoyable to refactor. For example, tests can become burdensome because they’re slow or need constant updates. We may be all well-intentioned, but if our values and principles are not aligned, we are all optimizing for different goals; thus, we are doing a different practice.

Besides, I’m tired of all the energy-draining discussions around the difference between ‘unit testing’ and ‘integration testing’. I propose we redirect those efforts toward defining values and principles. Whenever I don't know what to do, I revisit them (as a pair or as a team), and that guides most of my decisions. This allows for more flexibility in each case. There’s no hard rule. I may decide on a different type of test per case, depending on the confidence I need to feel, the costs of the test, how decoupled they need to be from the implementation, how they’re documenting the code, etc.

Learn more

Decoupling tests from implementation details
A vertical testing strategy
Bigger Unit Tests are Better (Peter Schuler)
Test desiderata (Kent Beck)