Automated Testing Goals
In an agile mindset, the team is responsible for the delivered software, which includes quality assurance (QA). In this scenario, QA is a role that is adopted by the team as a whole — developers should be writing automated tests. Manual testing does not scale as it’s hard to reproduce, tedious, error-prone, and not compatible with continuous delivery. However, if there are QA specialized people, they should be doing other types of work (e.g. exploratory testing), but I’ll leave that for another article.
Automating everything — from build to tests, deployment and infrastructure — is your only way forward. Ham Vocke, The Practical Test Pyramid
Automated tests are fundamental to software quality. “… but this is only a test” is always a bad argument.
[…] we developers make a mental divide. On one hand we have the application code, and we know that it’s vital to keep it clean and easy to evolve. On the other hand we have the test code, which we know isn’t part of our production environment. The consequence is that the test code often receives considerably less love. This is a dangerous fallacy because from a maintenance perspective there’s really no difference between the two. If our tests lack quality, they will hold us back. Software Design X-Rays
All goals have their relevance, but the ones I personally use to support test-related decisions are tests as specification, tests as documentation, tests as a safety net, and tests as defect locators. Let’s analyze them.
Tests as specification
I always bear in mind how precious and precise a test is, given that it enforces a specific implementation (the how) to faithfully respect that spec (the what). Tests act like the floor plan of a house (although in software development we can iterate); they define the system’s behavior, or in other words, what is expected, as in a contract: as soon as tests go green, they lock an implementation. Tests are a self-verifying executable specification since you can run them and they will automatically report the outcome.
The very act of thinking through various scenarios in enough detail to turn them into tests helps us identify those areas where the requirements are ambiguous or self-contradictory. Such analysis improves the quality of the specification, which improves the quality of the software so specified. xUnit Test Patterns
TDD (test-driven development) takes the concept of specification further by actually driving the implementation:
- only implement what the test says and no more;
- if there’s no test, you can’t implement or refactor anything;
- feel free to refactor as long as all the tests still pass.
A good side-effect of TDD is that it forces you to separately think about the what (public API and behavior) and the how (implementation details), so you put more dedicated brainpower into each (without TDD, you think about both simultaneously). Tests created with TDD target the behavior so they are more likely to be black-box. In fact, TDD is sometimes referred to as test-driven design because it guides the design since the test is the very first client of the implementation.
📝 A typical misconception is that TDD defines your system architecture. It may give you hints about it though. For example, when it’s relatively hard to write a test (e.g. lots of test doubles or a big setup), there’s probably some implementation smell.
Tests as documentation
Tests are the best documentation of a system. Unlike code comments, readme files, or ADRs, tests are live documentation. Whenever I look at a pull request or a new codebase, I start with the tests, especially the higher-level ones. This allows me to understand what the code actually does. Like with an instructions manual, both tell me “as a user, what are the features of this system and how can I use them?”. I don’t need to disassemble the washing machine nor do I need to walk through the implementation code to understand their features.
Repelling bugs isn’t the only thing the tests can do for us. They can also show the test reader how the code is supposed to work. Black box component tests are — in effect — describing the requirements of that of software component. […] Without automated tests, we would need to pore over the SUT code trying to answer the question, “What should be the result if . . . ?” xUnit Test Patterns
📝 I don’t recommend code comments in general. In practice, they get quickly outdated and nobody cares anymore. Additionally, they’re an excuse to make bad code with good explanations. You can’t do the same with tests. Comments are relevant if you have users other than your own development team (e.g. a library or a public API).
When a test fails, the first thing you do is to understand it. Remember that tests are what and the implementation is how. This means you can start by understanding what is failing by simply looking at the failing test. The implementation can change as long as the specs are respected.
When building a new feature or when a bug happens, you’re supposed to create tests for it (we do tests by creating examples). This enforces the idea that tests capture the behaviors of a system. The fact that tests are documentation has some consequences on the patterns and anti-patterns that arise at their source code; for example:
- Make tests more DAMP and less DRY;
- For high-level tests, consider creating methods resembling a DSL;
- Follow the Arrange, Act, Assert pattern.
Tests as a safety net
I’ve heard stories of developers that would get their code peer-reviewed by multiple people to reduce the fear of changes. This is like driving a car without a seat belt with everyone continuously on the lookout for dangers. There are also anecdotes of code being merged only after reviewed by the team leader — the bus factor is ridiculous here.
Ideally, the system should be resilient and protected from mistakes. Feeling safe is the strongest reason to make tests. I’ve seen bugs arising because of a refactor without a high-level test of the feature before.
📝 Tests alone are not enough to feel safe. Roughly speaking, you need a combination of:
• Lean and gradual approach: deliver features in small batches; do the minimum needed to have something useful; do not create long-standing branches with big changes; avoid technical tasks (e.g. just a backend endpoint); deliver user value in baby steps.
• Tests in one or more levels in the testing pyramid. Every test you make will become a regression test, as it ensures that what’s done before isn’t harmed by new features.
• High feedback loop and automated build system: CI/CD automatically triggers a build for every push, running the tests first; if a single test fails, the system is not deployed into production; if all the tests pass, you have a green build and the system is deployed. It’s all or nothing.
• Easy way to revert: if something goes wrong, going back should be quick: just revert the commit and wait for another automated build.
• Mature team: technology is not enough to feel safe; the team should see mistakes as part of the learning process, avoid blaming and strive to continuously improve.
I don’t like the rule “test all classes, functions, and methods”; you should test behavior (i.e. the public API of things) instead. A behavior may be supported by multiple programming artifacts and many things are tested indirectly (e.g. POJOs, database schemas, etc).
“The more tests the better” is a fallacy; the safety net doesn’t imply that you should test everything (e.g. CSS is not typically tested). At the end of the day, what matters is how confident you feel in the codebase. Find the sweet spot between the minimum set of tests and the maximum level of confidence.
I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence. Kent Beck
You just need to be careful with some anti-patterns that can hinder the safety net. For example, slow tests or randomly failing tests can be disabled by developers, thereby defeating the sole purpose of testing as a safety net.
Tests as defect locators
If you push some changes and a bug happens in the CI/CD pipeline, you don’t want to spend an hour trying to understand what was wrong while the build is red and nothing goes to production. Ideally, you should be able to quickly identify the culprit. High-level tests are not ideal for that because they’re slow, have more code (they might even need DSLs), and target a wide scope of implementation code. This is why we also need low-level tests. It’s also a good counterargument to “I only need customers tests”.
If our unit tests are fairly small (i.e., we test only a single behavior in each one), we should be able to pinpoint the bug quickly based on which test fails. This specificity is one of the major advantages that unit tests enjoy over customer tests. xUnit Test Patterns
Whenever I have to create a new test, I think about where to place it in the testing pyramid. Then, I ask if that placement fulfills the goals:
- Tests as specification: Are the tests driving the implementation? Do they fail when you change the implementation?
- Tests as documentation: Can the tests tell you what the system under test is capable of? Do they document the behaviors?
- Tests as a safety net: How confident do you feel when refactoring? Does your CI/CD pipeline prevent the deployment of failing builds?
- Tests as defect locators: Can you quickly pinpoint the cause when a test fails?
We can also identify tests as a support tool for refactorings as another testing goal. It’s a direct consequence of all of the above. Refactorings should be safe and not be painful. I recommend learning more about testing goals in chapter 3 of xUnit Test Patterns.