When you need to test an object using collaborators and want to be sure that the tested behavior is provided only by the code in the object, you probably will need test doubles.
Using test doubles, you can suppress or keep under control the behavior of the collaborators. This way you isolate the object from any other influence. Also, by using test doubles, you can avoid costly dependencies that can be out of your control or that introduce indetermination or performance penalties.
But, let’s start from the beginning.
How we test a piece of software
We test a piece of software by comparing the result of executing it against some defined criteria. We usually name this piece of software the subject under test or SUT.
These pieces of software can be one of these two types, but not both:
A query retrieves some information about the system's state but doesn’t produce any side effects.
A command produces a change in the state of the system but doesn’t return a response.
Yes: This is the Command Query Separation principle by Bertrand Meyer.
Queries are pretty easy to test because we only need to get the response and compare it with the expected outcome.
When testing queries we could use test doubles to replace expensive dependencies and to define their behaviors for the different scenarios. Usually, we will be using stubs or fakes for that.
Commands, instead, are a bit harder for testing. We need to verify that we get the expected outcome by looking for their effect on the system. Nevertheless, sometimes we don’t have the possibility to do that because we can’t use the real dependency and we use a test double.
We will need doubles to define the collaborators' behavior and verify that we produce the desired effect, expecting that we send the proper messages to them. These kinds of doubles are called spies or mocks.
An example is when we need to test a service that sends an email. There is no way to check that we send a real email. Even if we could do it, looking in some specific mailbox, the performance and reliability of the test would be a complete disaster. So, instead of that, we ensure that we call the appropriate methods of the mailer library with the correct message. We can see an example here:
The side-effect of the command is tested in the last two lines, where we interrogate the MailerSpy about the calls performed and the receiver of the message
Introducing test doubles
We will be using a test example to introduce the different types of test doubles. Imagine that we are building a feature to greet our customers for their birthday, sending them an email, maybe with a promo code or another goodie. Here is the setUp:
To unit test the use case GreetCustomerForBirthday, we will need to double all of its collaborators.
Dummies are test doubles that have no behavior, all their methods return null or nothing. We use them because we need to comply with some interface and we are not particularly interested in what they do.
Let’s start with the Logger. We may want a logger in the use case but we are not worried about how it is used, but we need it to instantiate the GreetCustomerForBirthdayHandler. This is the case for a dummy.
Stubs are test doubles that should have a predefined behavior that we want to control. Imagine that we need to test a service that gets some information talking to an external API. We have an adapter to talk with this API so we will need to double it instead of calling the real API.
In one of the possible scenarios, we can simulate that the API returns a correct value. In another, we can simulate that the API is down, so we test that our service can manage that situation gracefully. In every possible scenario, we define a stubbed behavior so we can verify our piece of software.
The use case in our example needs to get the current date to find the customers celebrating a birthday. Working with dates is always a tricky question, so, instead of access to the real system clock, we abstract it in the form of a ClockService. This way. we simply need to stub a fixed date and store a crafted Customer in the test repository.
Sometimes we need a collaborator that has the same behavior as the real dependency but in a cheaper way. For example, instead of having a database-backed repository, we could use memory storage to provide the same functionality without the performance penalty. This kind of double is called a Fake.
Fakes should pass the same tests that the original collaborator, they are alternative implementations.
We decided to use a FakeCustomerRepository for this test using a memory collection implementation.
When we are testing commands we need to verify that they produce the expected outcome. If we need to double the dependency we won’t be able to check that outcome. Alternatively, we will need to verify that we send the right message to the collaborator in the object-oriented programming sense.
Spies are test doubles that can register how they are used so we can interrogate them after executing the subject under test and make assertions about that.
The side effect of this use case is to send an email. We need to create a Spy that can verify that we call the send method of the MailerService with the appropriate data.
Mocks resolve the same problem as spies: we use them to verify that the correct message is sent to the collaborator. The difference is that a mock expects to be used in a certain way so you could say that it carries the assertion with it.
The problem with spies and mocks is that they introduce a certain degree of fragility given that they couple the test with the implementation of the code under test.
Also, tests using mocks are more difficult to understand because they hide the assertion into the expectation. It is preferable to use spies, instead.
But, do we really need to use doubles?
There is some controversy about using or not doubles in tests. Most of the concerns are with spies and mocks because of the coupling they introduce. Other objections have to do with the fact that we could consider the behavior of an object as a composition of the behaviors of its collaborators, and test that through the public API of the subject under test. Those are valid points, but using doubles is a trade-off we need to accept in several circumstances.
Doubles as boundaries
Functional tests are those that verify the behavior of the subject under test. We can have different types of functional tests according to the scope of the test:
Unitary test: check a unit of software in isolation. We should double any collaborator of the unit to ensure that the outcome is produced only by the code of the unit. Nevertheless, you can use real collaborators instead of doubling them if they have no external dependencies.
Integration tests: check subsystems to verify that their components communicate properly. We need to double the dependencies used by the subsystems that don’t belong to it.
End-to-end tests: check the behavior of a system from the outside, using its entry and output points. We should double things that we don’t own, that are expensive in terms of performance, or that have undefined behavior, such as external services, mostly via fakes or crafted versions.
As you can see, doubles help to define the boundaries of the test scope. You use a double whenever you need to set a boundary that you don’t want to cross in testing.
Those boundaries seldom match architectural boundaries that put your tests at risk if you cross them.
You don’t want to include in your test the performance penalty of talking to a real database at the unitary level, so you will double the database adapter to achieve this.
You don’t want the risk of having pre-existing data that will make unpredictable the behavior of the subject under test and the overload of having to manage to keep the database state clean.
You don’t want to cross your fingers trusting some external API to be up and running or returning the expected responses for your tests. In consequence, you will need to fake it in some way.
You don’t want to be dependent on the machine or the concrete set-up on which the test runs, so you use doubles to isolate from those. A typical situation is when a test has to manage dates or times, you should stub the behavior of the system clock to avoid tests that fail or pass depending on the machine, the timezone, or even the date or time when you run them.
So, doubles are one of the tools that we count on to isolate tests from all the details that we don’t own or cannot control.
Outside-in test-driven development
The so-called London school of TDD uses doubles as first-class citizens in the outside-in approach to test-driven development.
In the inner unitary test loops, mocks are used extensively as design tools, doubling the collaborators used by the component under development to define and refine their interfaces and the communication between units. The drawbacks of this approach, such as the test-implementation coupling, are compensated by the fact that we are designing the implementation and it probably won’t change.
Best practices with doubles
You don’t share doubles behavior between tests, except fakes.
Shared doubles can introduce dependencies between tests, something that leads to low reliability and low trustfulness. The needs of one test are not the same as another, and they should be able to evolve separately.
So, you can share doubles when:
- They are fakes in the sense of alternative, low-cost fully-owned implementations of a given dependency.
- They are dummies, so they have no behavior, and, therefore, they don’t introduce cross dependency.
- They are configurable stubs, so you can control the behavior from the test.
You only stub the behavior that is interesting for the current test.
In addition, you don’t want to have one-size-fits-all doubles. You only stub the behavior you need for the test at hand, forgetting about other possible scenarios. Remember that you want to have isolated tests, even inside the same test case.
You set the minimum expectations needed to verify a side effect in mocks.
Expectations are useful to assert side effects in mocks, but you don’t use them to verify calls to stubs.
I mean: in a query, you could need to stub some behavior in a collaborator, the test will execute the call to that collaborator and get the stubbed response. The test will pass if the code under test manages the collaborator response the right way returning the correct response. In this situation, you don’t need to assert that the message was issued to the collaborator: it is implied in the successful test execution.
When testing a command, you will need to verify the side-effect of calling a specific collaborator, so an expectation should be set to check that this really happens, and only for that.
Ideally, you will have only one expectation by test except when you need some triangulation.
How to create doubles
There are several techniques to create doubles:
Using real objects
This technique is specially used when creating fakes, but you can use it for other types of doubles you would like to share between tests, like dummies.
It’s pretty straightforward: create and use a new implementation for the interface you need to double. The following example is a pretty simple example of a CustomerRepository implementation only for tests:
To use it, we only need to do this:
Let’s see an example of a spy:
The main problem with this type of implementation is that it is not Liskov Substitution Principle compliant given we add some query methods to be able to verify the expected outcomes. This is a trade-off that we can accept because their use is limited to the test environment.
Using anonymous classes
Anonymous classes are a great way to create disposable doubles for a specific test without using a mocking framework and keeping the advantages of real objects. Here, you have an example:
The previous example violates Liskov Substitution Principle, so you can be stricter:
As you can see, we don’t even need to extract the class to another file, but you could use a factory if makes sense for your use case.
Using mocking frameworks
Mocking frameworks can be an easy way to build doubles that are expensive to double using the other techniques. For example, in PHP, the PSR Logger Interface is huge, with no less than nine methods to implement. It is way simpler to use the mocking framework like this and get a dummy double:
Every mocking framework has its own syntax. I will be using the native PHPUnit one in these examples, but you will get the point.
To build a stub, you only have to define a response to a method call, like this:
This is an example of a MailerService mock that expects the method send to be called with a message.
In this snippet we have defined a mock of the MailerService interface, setting the expectation that we will be calling its send method once with a Message object.
We can create a spy, instead of a pure mock:
In this example, you inject the $mailerMock as a dependency, but you check the outcomes asking to the $mailerSpy. It is pretty weird, but other mocking frameworks offer better interfaces for this feature.
In fact, you probably will be better served by crafting your own doubles instead of using a mocking library.
Test doubles are a tool that we will need to use sooner or later. Their purpose is to help us isolate the code under test replacing their dependencies.
We can use several techniques to create them ranging from implementing our own doubles to use a mocking library.