Your Test Should Verify If The Code Solves The Problem, Not If It Runs 🔊

If it runs, that doesn't mean it works

The picture of two dolls. The doll on the left is made out of Amazon packages. The doll on the right is made out of blue plastic. The doll on the left is with the arms forward in an expression that means "calm down." The doll in the right is with the arms up with an expression of anger. Picture by Guillermo Viciano, Source

Programmers seem to have forgotten the real purpose of tests, that is to prove the code can solve a real-world problem, and all the parts can fit together.

Imagine you have been assigned to build a report containing the user information from several sources. That report uses an entirely fictitious provider called “Fancybook.”

Part of the task is the pre-fill step. The pre-fill step receives the form fields from the UI and fills them with some data from the provider.

The rules are:

  • If the field name from the provider matches the ones in the form fields, then pre-fill them with the provider value. Otherwise, leave the form field as it is.
  • For the "age" form field, calculate the age and pre-fill using the provider's "date of birth."

There are many mapping rules. However, for the sake of brevity, let’s use the "age" field as the only example.

For this design, you divide the problem into functions. Also, you go by the rule that one function should have one test file.

It looks like this:

  • The primary function is called prefillFields(). It takes two arguments: one Object Literal representing the form fields to pre-fill, and another Object Literal representing the fields coming from the provider. For both arguments, the key is the field name; the value is the field value. The function returns the form fields pre-filled.
  • The second function is called getValue(). It takes two arguments: one String which represents the field name, and an Object Literal representing the fields coming from the provider. The getValue() is called by the function prefillFields() to retrieve the form field value from the provider. The function returns a String.
  • The third function is called getAgeFromDate(). It takes one argument, which is a String in the format of year, month, day. It returns another String that represents the value for the "age" form field.

The function prefillFields() imports and calls the function getValue(). The function getValue() imports and calls the function getAgeFromDate().

Here's the code:

The runnable code for the design of the pre-fill system.
A diagram that represents the tree of dependencies for the design of the pre-fill system. There's one block at the bottom with the caption "pre-fill fields." The block has two arrows. One arrow points to a block with the caption "get value." The other arrow points to an opaque block with the caption "adjust field values." The blocks "get value" and "adjust field values" also have other arrows pointing to other dependencies ad infinitum.

In this case, the function prefillFields() eventually calls getAgeFromDate() somewhere in the dependency tree. The function getAgeFromDate() is well covered with tests. Therefore, you assume that there’s no need to test the age conversion logic for the tests that exercise the function prefillFields().

However, if you do that, you'll create a situation where the tests for the function prefillFields() won't fail if there are breaking changes for the age conversion logic. What happens if you replace the getValue() call with another function that does the same thing but doesn't have the age conversion logic implemented correctly?

I'll tell you what happens: the system breaks in production, and all the tests pass.

If the structure of the code allows you to make a breaking change without a test failure, the code is fragile.
The runnable code with the function "prefill fields" broken after refactoring. As you can see, the tests still pass, even if you break the code.

You have designed the structure of the code in a way the tests for the function prefillFields() won't fail if you change the behavior of the function getValue(). That's dangerous. In the face of that issue, some developers tend to abuse tools such as Proxyquire to mock dependencies. With Proxyquire, you can monkey patch the imports for the function prefillFields() at runtime and point the dependencies to a mocked module.

That creates more problems than it solves.

The tests should stay passing in a refactoring operation. Otherwise, that's not refactoring. If you make changes in the code that doesn’t change behavior, you should not make the tests fail. Keep that in mind.

If the business renames the "age" form field to "years since birth" and wants to keep naming consistency throughout the codebase, you have to rename the function getAgeFromDate() to getYearsSinceBirthFromDate() everywhere.

If you have one test by function, one function by file, and multiple Proxyquire calls, due to the high degree of coupling, every time you refactor to rename a function, you have to rename every single import in all files, including the tests!

That doesn't sound like refactoring.

Imagine you have the tests running every time you save. In this situation, it's impossible to change how the components integrate and still keep the tests green. In the test code, you need to fix each import and each mock at the same time as the production code!

If you have a high degree of coupling between code and their test files, the code is fragile.
The runnable code that shows the use of Proxyquire as a mocking tool in the tests.

In the example above, try to follow the rule that every time you make a change in one line, you have to run the tests. Now, try to make as many changes as you want to rename the “prefill fields” file without breaking the tests.

Here's how you can do it:

  • Duplicate the file and give it a new name.
  • Rename the old imports to point to the new file.
  • Then delete the old file.

However, that's a significant effort for a simple change. Imagine if each component was inside a Distributed System. Changes in one component would require changes everywhere in the network. That's when you know you're dealing with a Distributed Monolith.

You can design the structure of the system to limit the number of reasons that can force you to change everything at the same time. Instead, design the code closed for modification, but open for extension.

The first step is to look at the characteristics of the Problem Domain, not the coding solution.

To limit the number of reasons to touch every part of the system when a change comes, start with the problem first.

Here’s how you do it.

In the pre-fill problem, there are some key topics: a pre-fill, mapping logic, form fields, and provider fields.

Without the form fields, the pre-fill doesn't work; it's a mandatory thing to solve the problem. However, the mapping logic is not required. Without the mapping logic to pull out the “age” from “date of birth” field, the pre-fill can still work, although with a subset of the functionality. The mapping logic is pluggable. Therefore, the pre-fill functionality can evolve incrementally.

The provider, though, is required. Otherwise, there's no source where you can pull the field values. If you don't want the provider, you might as well return the form fields without passing them through the pre-fill. However, you don't need to couple the pre-fill to the Fancybook provider. As long as the provider conforms to a standard interface, you can inject any instance, including a stub for the tests.

The pre-fill, in another hand, doesn't need to contain any complex logic whatsoever. It serves only to orchestrate the mapping logic between the form fields and the provider. If you look at the problem, that's the actual role of a pre-fill!

Here's how a test for the "age calculation" mapping logic would look like under the new design:

The test code above shows a "Prefill" instance receiving a "Stubbed Provider" instance as the argument. The code assigns the result to a variable called "stubbed prefill." The "stubbed prefill" has a method "add." The code calls the method "add" with a "Mapping Logic" instance. The code creates the "Mapping Logic" instance with an argument "for age calculation." The code also calls the method "prefill" from the "stubbed prefill" variable passing an instance of the "form fields." The code stores the result in a variable called “prefilled form fields.” The assertion tests that the string representation of the “prefilled form fields” is correct.

Here’s how a test for the "simple matching" would look like under the new design and an alternative coding style:

The test code above shows a function "Prefill" with capital case receiving two arguments, a "Stubbed Provider" and a "Mapping Logic" instance. The "Mapping Logic" receives one argument "for simple matching." The code stores the result of the "Prefill" call in a variable called "prefill" in lowercase. The code calls the "prefill" variable as a function with the "form fields" as the only argument. The code stores the result in a variable called “prefilled form fields.” The assertion tests that the string representation of the “prefilled form fields” is correct.

Regardless of the coding style, both examples share the same vocabulary. Now, the structure of the code is the same as the components of the Problem Domain. Most test cases can construct the same Prefill, only with different form fields, mapping logic, and provider arguments.

It doesn't make sense to test everything separately for this problem anymore.

A diagram that represents the tree of dependencies of the new design for the pre-fill system. There's one block in the center with the caption "Prefill." Around that block, arrows are pointing to other blocks with the captions "Form Fields," "Mapping Logic" and "Provider."

The new design converts the problem into pluggable pieces that are easier to reason about as a whole. If you look at the code, you can see if all the pieces are in the right places. Therefore, you can reason if the system is correct. You have abstracted the code away from the logical complexity that requires testing.

You can write one or two integration tests for the Production Code to verify if the pieces integrate. That should be enough to give you confidence the system won’t break in production.

Forget 100% test coverage for lines of code.

If you separate the structure of the code according to the structure of the problem, you won't have to change the structure of the code unless the structure of the problem changes completely.

The new design also doesn't follow one test by model. Therefore, you can refactor the internals of all the pieces. As long as each piece doesn't change their interface of communication with the Prefill, or their visible behavior, the tests won't break.

The purpose of a test is also to prove the interface of each piece contains the right affordances to fit together.

If you structure the code according to the topics of the Problem Domain, you'll come up with more efficient patterns for testing.

Think about how you would solve this problem in real life.

Imagine your job is to make it easy for people to drink tea. You decide to build a teacup. The measure of success for a teacup is clear: A person should be able to pour hot water and drink tea with it.

You should design the affordances for the interfaces of your product to fulfill the following goals:

  • The user should be able to put boiling water with tea on it.
  • The user should be able to grab the tea and drink it from the cup.
  • The user should not burn their fingers in the process.
  • If the user spills the water around the cup, it should not damage their furniture.
The value of a test lies in its ability to prove the code can solve a problem, not to merely verify if the code runs.

To test the cup, call one or a few users and ask them to drink the tea using the cup with different tea types. Every time you change the internals of the cup, such as the color, or the design of the stamp, you expect no external impact. If you call the user again, the test should pass.

That's called refactoring.

If the test fails, then that means you made the wrong change, like a color that hurts the eyes or a stamp that is insulting to some people.

In the teacup example, the tester is the user paid to act in a controlled environment. The actual test is the script you verbally ask them to follow, like drinking with different water temperatures or different cup handles. The whole cup with all the parts together is the System Under Test (SUT).

In the pre-fill example, the tester is the test runner. The actual test is the script that you ask the machine to follow, like to load up the Prefill model and verify some behavior. The Prefill model with different mapping logic, provider and form fields is the System Under Test (SUT).

You can only test if the teacup works when you verify all the pieces together. In this circumstance, it makes no sense to test the handle in isolation if it only provides value when attached to the cup. Same goes for the Prefill model. In this circumstance, it makes no sense to test the mapping logic in isolation if it only provides value when attached to the Prefill.

Only by simulating the problem and verifying that all the pieces fit together you can be confident that the code works.

The first design does not map the Problem Domain correctly. Their tests cover the wrong things and therefore have a high degree of coupling. It's a mess.

The second design maps the Problem Domain in a better way. It shows the importance to have a design with one common structure to control how the pieces fit together.

There’s no such rule as “one test file by model” or “one test by file.” The reason that's a general habit is that we don't think about the problem first and model the code accordingly.

There are cases where the level of cohesion of one model forces you to have one test file for it. However, that's not a rule. You need to evolve the test code structure in a different way of how you evolve the structure of the production code.

That is called Test Contravariance.

[…] Design the structure of your tests to be contra-variant with the structure of your production code. […] The structure of your tests should not be a mirror of the structure of your code […] As development proceeds, the behavior of the tests becomes ever more specific. The behavior of the production code becomes ever more generic. The two behaviors move in opposite directions along the generalization axis.
— Robert Martin on TestContravariance

The first design creates a rigid structure. It aimed at the solution, not the problem. It’s a smell that there's not enough analysis and design upfront. In that case, it’s more likely there will be unintended changes in the behavior of the system, and when they happen, the tests won’t help you.

The second design creates a structure similar to the structure of the Problem Domain. There’s just enough design upfront, not Big Design Up Front. In that case, it’s less likely that there will be unintended changes in the behavior of the system, and when they happen, the tests will help you.

Forget always to try to achieve 100% test coverage for lines of code. Forget always to create "one test file by function" or "one test file by class."

Nobody cares about dogmatic practices that can drive you to create a brainless doll.

What matters is how you structure the code for the problem you are trying to solve.

Everything else is just a function call.


Thanks for reading. If you have some feedback, reach out to me on Twitter, Facebook or Github.