Your Test Should Verify If The Code Solves The Problem, Not If It Runs 🔊
If it runs, that doesn't mean it works
Programmers seem to have forgotten the real purpose of tests, that is to prove the code can solve a real-world problem, and all the parts can fit together.
Imagine you have been assigned to build a report containing the user information from several sources. That report uses an entirely fictitious provider called “Fancybook.”
Part of the task is the pre-fill step. The pre-fill step receives the form fields from the UI and fills them with some data from the provider.
The rules are:
- If the field name from the provider matches the ones in the form fields, then pre-fill them with the provider value. Otherwise, leave the form field as it is.
- For the "age" form field, calculate the age and pre-fill using the provider's "date of birth."
There are many mapping rules. However, for the sake of brevity, let’s use the "age" field as the only example.
For this design, you divide the problem into functions. Also, you go by the rule that one function should have one test file.
It looks like this:
- The primary function is called
prefillFields(). It takes two arguments: one Object Literal representing the form fields to pre-fill, and another Object Literal representing the fields coming from the provider. For both arguments, the key is the field name; the value is the field value. The function returns the form fields pre-filled.
- The second function is called
getValue(). It takes two arguments: one String which represents the field name, and an Object Literal representing the fields coming from the provider. The
getValue()is called by the function
prefillFields()to retrieve the form field value from the provider. The function returns a String.
- The third function is called
getAgeFromDate(). It takes one argument, which is a String in the format of year, month, day. It returns another String that represents the value for the "age" form field.
prefillFields() imports and calls the function
getValue(). The function
getValue() imports and calls the function
Here's the code:
In this case, the function
prefillFields() eventually calls
getAgeFromDate() somewhere in the dependency tree. The function
getAgeFromDate() is well covered with tests. Therefore, you assume that there’s no need to test the age conversion logic for the tests that exercise the function
However, if you do that, you'll create a situation where the tests for the function
prefillFields() won't fail if there are breaking changes for the age conversion logic. What happens if you replace the
getValue() call with another function that does the same thing but doesn't have the age conversion logic implemented correctly?
I'll tell you what happens: the system breaks in production, and all the tests pass.
If the structure of the code allows you to make a breaking change without a test failure, the code is fragile.
You have designed the structure of the code in a way the tests for the function
prefillFields() won't fail if you change the behavior of the function
getValue(). That's dangerous. In the face of that issue, some developers tend to abuse tools such as Proxyquire to mock dependencies. With Proxyquire, you can monkey patch the imports for the function
prefillFields() at runtime and point the dependencies to a mocked module.
That creates more problems than it solves.
The tests should stay passing in a refactoring operation. Otherwise, that's not refactoring. If you make changes in the code that doesn’t change behavior, you should not make the tests fail. Keep that in mind.
If the business renames the "age" form field to "years since birth" and wants to keep naming consistency throughout the codebase, you have to rename the function
If you have one test by function, one function by file, and multiple Proxyquire calls, due to the high degree of coupling, every time you refactor to rename a function, you have to rename every single import in all files, including the tests!
That doesn't sound like refactoring.
Imagine you have the tests running every time you save. In this situation, it's impossible to change how the components integrate and still keep the tests green. In the test code, you need to fix each import and each mock at the same time as the production code!
If you have a high degree of coupling between code and their test files, the code is fragile.
In the example above, try to follow the rule that every time you make a change in one line, you have to run the tests. Now, try to make as many changes as you want to rename the “prefill fields” file without breaking the tests.
Here's how you can do it:
- Duplicate the file and give it a new name.
- Rename the old imports to point to the new file.
- Then delete the old file.
However, that's a significant effort for a simple change. Imagine if each component was inside a Distributed System. Changes in one component would require changes everywhere in the network. That's when you know you're dealing with a Distributed Monolith.
You can design the structure of the system to limit the number of reasons that can force you to change everything at the same time. Instead, design the code closed for modification, but open for extension.
To limit the number of reasons to touch every part of the system when a change comes, start with the problem first.
Here’s how you do it.
In the pre-fill problem, there are some key topics: a pre-fill, mapping logic, form fields, and provider fields.
Without the form fields, the pre-fill doesn't work; it's a mandatory thing to solve the problem. However, the mapping logic is not required. Without the mapping logic to pull out the “age” from “date of birth” field, the pre-fill can still work, although with a subset of the functionality. The mapping logic is pluggable. Therefore, the pre-fill functionality can evolve incrementally.
The provider, though, is required. Otherwise, there's no source where you can pull the field values. If you don't want the provider, you might as well return the form fields without passing them through the pre-fill. However, you don't need to couple the pre-fill to the Fancybook provider. As long as the provider conforms to a standard interface, you can inject any instance, including a stub for the tests.
The pre-fill, in another hand, doesn't need to contain any complex logic whatsoever. It serves only to orchestrate the mapping logic between the form fields and the provider. If you look at the problem, that's the actual role of a pre-fill!
Here's how a test for the "age calculation" mapping logic would look like under the new design:
Here’s how a test for the "simple matching" would look like under the new design and an alternative coding style:
Regardless of the coding style, both examples share the same vocabulary. Now, the structure of the code is the same as the components of the Problem Domain. Most test cases can construct the same
Prefill, only with different form fields, mapping logic, and provider arguments.
It doesn't make sense to test everything separately for this problem anymore.
The new design converts the problem into pluggable pieces that are easier to reason about as a whole. If you look at the code, you can see if all the pieces are in the right places. Therefore, you can reason if the system is correct. You have abstracted the code away from the logical complexity that requires testing.
You can write one or two integration tests for the Production Code to verify if the pieces integrate. That should be enough to give you confidence the system won’t break in production.
Forget 100% test coverage for lines of code.
If you separate the structure of the code according to the structure of the problem, you won't have to change the structure of the code unless the structure of the problem changes completely.
The new design also doesn't follow one test by model. Therefore, you can refactor the internals of all the pieces. As long as each piece doesn't change their interface of communication with the
Prefill, or their visible behavior, the tests won't break.
The purpose of a test is also to prove the interface of each piece contains the right affordances to fit together.
If you structure the code according to the topics of the Problem Domain, you'll come up with more efficient patterns for testing.
Think about how you would solve this problem in real life.
Imagine your job is to make it easy for people to drink tea. You decide to build a teacup. The measure of success for a teacup is clear: A person should be able to pour hot water and drink tea with it.
You should design the affordances for the interfaces of your product to fulfill the following goals:
- The user should be able to put boiling water with tea on it.
- The user should be able to grab the tea and drink it from the cup.
- The user should not burn their fingers in the process.
- If the user spills the water around the cup, it should not damage their furniture.
The value of a test lies in its ability to prove the code can solve a problem, not to merely verify if the code runs.
To test the cup, call one or a few users and ask them to drink the tea using the cup with different tea types. Every time you change the internals of the cup, such as the color, or the design of the stamp, you expect no external impact. If you call the user again, the test should pass.
That's called refactoring.
If the test fails, then that means you made the wrong change, like a color that hurts the eyes or a stamp that is insulting to some people.
In the teacup example, the tester is the user paid to act in a controlled environment. The actual test is the script you verbally ask them to follow, like drinking with different water temperatures or different cup handles. The whole cup with all the parts together is the System Under Test (SUT).
In the pre-fill example, the tester is the test runner. The actual test is the script that you ask the machine to follow, like to load up the
Prefill model and verify some behavior. The
Prefill model with different mapping logic, provider and form fields is the System Under Test (SUT).
You can only test if the teacup works when you verify all the pieces together. In this circumstance, it makes no sense to test the handle in isolation if it only provides value when attached to the cup. Same goes for the
Prefill model. In this circumstance, it makes no sense to test the mapping logic in isolation if it only provides value when attached to the
Only by simulating the problem and verifying that all the pieces fit together you can be confident that the code works.
The first design does not map the Problem Domain correctly. Their tests cover the wrong things and therefore have a high degree of coupling. It's a mess.
The second design maps the Problem Domain in a better way. It shows the importance to have a design with one common structure to control how the pieces fit together.
There’s no such rule as “one test file by model” or “one test by file.” The reason that's a general habit is that we don't think about the problem first and model the code accordingly.
There are cases where the level of cohesion of one model forces you to have one test file for it. However, that's not a rule. You need to evolve the test code structure in a different way of how you evolve the structure of the production code.
That is called Test Contravariance.
[…] Design the structure of your tests to be contra-variant with the structure of your production code. […] The structure of your tests should not be a mirror of the structure of your code […] As development proceeds, the behavior of the tests becomes ever more specific. The behavior of the production code becomes ever more generic. The two behaviors move in opposite directions along the generalization axis.
— Robert Martin on TestContravariance
The first design creates a rigid structure. It aimed at the solution, not the problem. It’s a smell that there's not enough analysis and design upfront. In that case, it’s more likely there will be unintended changes in the behavior of the system, and when they happen, the tests won’t help you.
The second design creates a structure similar to the structure of the Problem Domain. There’s just enough design upfront, not Big Design Up Front. In that case, it’s less likely that there will be unintended changes in the behavior of the system, and when they happen, the tests will help you.
Nobody cares about dogmatic practices that can drive you to create a brainless doll.
What matters is how you structure the code for the problem you are trying to solve.
Everything else is just a function call.