The journey from “When to Mock” to “How to Mock”

Published in

Wise Engineering

5 min readJun 3, 2020

Let’s build. Photo by Todd Quackenbush on Unsplash

When I was reading the When to Mock article for the first time I had mixed feelings. First I was nodding: well, this is obvious. No mocks — bad. Too many mocks — bad. Mock across your architectural boundaries — good. That’s an easy “check” for any of our services. The part of “Write your own mocks”, that’s where I got confused. Why would anyone want to write another Mockito or Spock? None of the arguments seemed right to me.

It turned out I might’ve not fully understood what it means to write your own mock. The remarkable bit here is that we had them in our production codebase. It’s just that nobody in the team called them and used them as mocks. Let’s first introduce an example.

The domain

In the beginning of 2020, TransferWise launched support for Direct Debit payments from the multi-currency accounts.

In a nutshell, a Direct Debit is an instruction from a customer to their bank authorising a merchant to collect money from their account. Or if you’re not into the clerk lingo, you can think of it this way: you’re telling your bank that the gym can pull money from your bank account until your membership lasts (usually till the summer 🏖️).

To let our customers pay their bills via Direct Debits in the UK, Eurozone (EEA), the US, and Australia we created a core service that models the real-world scenarios.

According to domain-driven design rules we created domain entities that represent an instruction, a payment and a customer:

Instruction, Transaction and Payer are entities representing a direct debit instruction, payment and a customer — Part of Direct Debit domain

The problem

Sometimes our customers undergo internal checks. During this time we have to put all their instructions on hold. We re-activate them once the checks are completed.

The actual class that re-activates customer’s instructions looks like this:

Instructions are processed in batches and sadly we have to also filter them in the code, due to the current implementation of the data access layer. Adding to this, the BATCH_SIZE is set to 1000 and is not configurable 😊.

In our team, we use the Spock framework to write tests. Which I, personally, adore for its expressiveness, BDD-style structure of tests and ability to write in Groovy. (So you should too give it a try if you haven’t yet.)

Here’s how this use-case was tested. As usual, the test method has three logical parts: setup, stimulus and response. Let’s go through each of them.

The pain

The setup. We use fixture object factories (aPayer(), anInstruction()) that help us create domain objects of any shape and state. So this part is pretty plain:

Now the stimulus is clear — we just call the method we want to test:

The juiciest part is the response:

Let’s recall what the input, the logic, and the output of the test method are.

The input is 1001 instructions (so we could have 2 pages 😊).

The logic being tested is:

countByFilters() is called with expected filters — we only need instructions of a particular payer in pending state.
findByFilters() is called 2 times with the same filters for page 1 and 2.
instructionRepository.save() is called 1001 times.

The output is all fetched instructions now have the Payer Approved flag set.

What caught my eye here was that:

The expected input has leaked into our logic checks — we assert parameters ofcountByFilters() and findByFilters() methods calls.
The actual input is invalid from the use-case perspective — instruction created by anInstruciton() factory can’t be returned by findByFilters() method in production given the filters provided.
The declared interactions are tightly coupled to the way data is fetched — we have to describe mock interactions for both countByFilters() and findByFilters() methods and can’t treat the service under test as a black box.

The hope

Of course, you can say that we can improve the service itself, so it can be tested with less mock interactions. And that’s absolutely correct — we should follow the advice the tests give us. They are indeed our guides to a better design.

But before jumping into refactoring, let’s compare how the test would look like if we’d write it using the real repository.

Now that’s an improvement in readability and separation of concerns:

The input is enforced to be correct — every instruction is actually pending and belongs to the payer.
The test is decoupled from the implementation detail — we’re free to change the batch size or remove batching at all.
The output stays clear.

How can I have the same concise unit test?

The answer?

To answer this question — handwritten mocks. Or in this particular case — in-memory repository implementations.

We had them since the beginning of the project, but they were just annoying classes that always failed compilation when someone changed a repository interface. Most of them had dummy methods’ implementations which return a default value — a null, a 0 or an empty Optional.

Abandoned in-memory repository

It took only a fraction of time to make them implement all the methods correctly. Thanks to Java collection streaming it was quite simple. Of course, the in-memory repositories also need to be tested. But it was really easy since I just reused the actual integration tests for the JDBC repository implementations.

Healed in-memory repository

Why do I think they won’t be a burden anymore and the team will be able to keep them up to date? Because now we know why we have them in the first place. They help us write clean, self-explanatory unit tests. They force us to replicate the same input that exists in the production environment. And this is a big win, because not only a good test should give confidence in the code, but it also can be a documentation of how the code works with the real data.

The final thoughts

Two things I’d like to mention here before the wrap-up.

It’s only easy to write an in-memory repository if they don’t use sophisticated queries. Otherwise, I think it doesn’t make much sense to do the rocket science in Java code and better stick with good old mocks.

A thoughtful reader could spot that in the second version of the test we don’t check the filter being used to fetch the data. In other words, if I’d swap findByFilters() call with findAll() inside ApprovePayer class — the test would still pass. There are two ways to mitigate this problem. Firstly, I could save some instructions that don’t match the expected filter and then check they weren’t changed by the ApprovePayer methods. Otherwise, I could also make use of the spying capabilities of the Spock framework and check the parameters passed to findByFilters method calls.

Last but not least, I allowed myself to call in-memory repositories “mocks” only because it’s how Robert C. Martin calls them in his article. You could also find the terms “test double” or “fake object” describing the same concept in other articles.