Avoiding mocks, Part II

You commonly want to test an app that requires databases and consumes web APIs. How do we deal with dependencies when testing while maintaining a good safety net?

Luís Soares
CodeX
7 min readJan 5, 2024

--

Photo by CHUTTERSNAP on Unsplash

❗️ Make sure you start with Avoiding Mocks, Part I.

Stubbing and mocking the dependency

Here, stubs/mocks of the dependency are done from outside the app’s boundaries.

[entrypoint][domain][adapter][http cli] ┋ [external dep]️⬅️

With stubs/mocks from the outside, we run all production code (including the adapter and the network client) because we observe network calls after they leave the app—we’re doing it from the outside. To the caller, this is indistinguishable from the actual dependency. This is desirable since our preferred testing units are the app’s behaviors.

⚠️ Never share spies and mocks in your test suite as they introduce state dependencies, undermining tests' independence. Stubs, spies, and mocks should be defined per test (co-locality) to ensure you immediately see the relationship of cause and effect (“given a mock, then effect happens”).

How?

A web server is launched in place of the actual dependency for each test. A spy can record the network calls commands make, which you can later assert. It can also respond with predetermined stubs (for queries). Alternatively to spies, mocking the dependency is possible, where you set up the expected calls and assertions beforehand. In this case, only these calls will be replied to; all non-matching cases will fail. Here’s the general recipe:

  1. Arrange: Launch the webserver to represent the third party. You can do it inline and programmatically (there is no need to set up anything outside the test). Then, set up the stubs/mocks for the test in place (you can have shared functions to prevent repetition and to insulate API changes — beware, this is not the same as sharing mocks, only their factories). Finally, create the test subject, ensuring you inject the web server’s URL.
  2. Act: Run the behavior you want to test. The caller is talking to the external stubs/mocks (localhost) through an adapter, “believing” it’s the third party.
  3. Assert: Make the necessary verifications (also, ensure you shut down the web server).

Stubing/mocking from the outside varies greatly depending on the language, tool, and runtime: In Go, this is a native technique. Otherwise, you can use pytest-httpserver, WireMock, MockServer, Mocks Server, API Simulator, or JSON Server. Alternatively, you can use a lightweight web framework (e.g., pytest-httpserver, WEBrick, Koa, or Javalin). In that case, I showcased multiple languages in a repository.
example

Are you worried about the slowness of having a web server per test? Lightweight web frameworks like Javalin or Koa are swift, making them perfect for testing. I did a test where I started Koa, made a post request, and stopped it. This was repeated 5000 times and took approximately 5 seconds.

📝 Consumer-driven testing (e.g., using pact.io) is like mocking the external dependency — the provider, but the mocks belong to a contract shared between the provider and consumer(s). It adds an extra layer of safety for internal dependencies, catching API-breaking changes early on. However, it can’t protect against third-party API-breaking changes. Running CI/CD tests every minute is not viable (and creating pipeline dependencies with third parties is not ideal). Instead, monitor the system and set up alerts for issues like error status codes (≥ 500) per minute.

Stubbing and mocking the network calls

In some languages or runtimes, running an inline web server in the test might be complicated or lengthy, so what can be done? We can stub and mock the network calls. We are removing the network calls from the equation in the tests. This means something captures the network calls before they cross the app’s boundaries (at the HTTP client).

[entrypoint][domain][adapter][http cli]⬅️ ┋ [external dep]️

This technique relies on monkey patching, which is why it’s more common to see it in dynamic languages.

⚠️ Beware, mocking the network changes the behavior of the underlying communication mechanism, which can couple you to specific HTTP clients or other preconditions. For example, if you change from native fetch to Axios, you must rewrite your mocks.

How?

We can mock the network call with libraries like WebMock, Mock Service Worker, or responses. We set expected calls beforehand as mocks and stubs (their generators can be shared to insulate API changes and prevent repetition). If unexpected calls happen, the test will fail. We can do additional assertions in the assertion phase because mocks also spy.

⚠️ There are network spies who record the network calls made during a test so you can assert them afterward (e.g., number of calls, arguments). Usually, this is done using cassette-based recorders like vcr, VCR.py, or ExVCR. This is snapshot testing, which should be avoided at all costs — it bloats your project with snapshot files and kills the cohesion between the test and its expectations.

Faking the dependency

Faking the dependency is the ideal option. All production code is being run without dependency on function or network calls. A fake is a standalone mechanism that replaces the external dependency by behaving as a look-alike. Unlike stubs and mocks, a fake has no hardcoded data nor assertions tailored per test. A fake has its own identity and state, making it generic and reusable. Typical examples include a testing database, an in-memory database, and a fake payment provider.

[entrypoint][domain][adapter][http cli] ┋ [external dep]️⬅️
‏‏‎ ‎(where ⬅️ points to what’s being replaced by the test double)

📝 Besides testing, fakes can be used for development purposes (e.g., running the app locally against them).

Fakes are the only way to apply dogfooding when a subject has dependencies: we can assert a command’s success by leveraging the existing queries (e.g., call the “view movements” query to verify the “transfer money” command). Often, you don’t need a test because the command is used during other tests’ setups (arrange phase). For example, you don’t need a test for “create customer” because you already have a test for “upgrade customer” that relies on it to be set up.

A fake is the ideal option to replace a dependency since the tests exercise all production code. Besides, your tests have no dependencies on technical details — we don’t check calls made or arguments passed.

📝 Asserting with mocks means we do ‘behavior verification’ (don’t confuse this with our testing units, which are the apps' behaviors). To reduce the reliance on mocks, we verify the impact on the system after the change, known as state verification. This way, we test what the code intends to do, not how it does it. If you have dependencies, only fakes enable this to its full extent.

How?

It depends on the nature of what you’re faking. Using Docker (e.g., RabbitMQ in Docker) or Testcontainers, you can rely on a turnkey solution like Fake-AWS.
example

You can also handcraft fakes using API Simulator. For end-to-end testing, you can rely on cloud services such as Mailslurp. The most flexible solution is creating a new app using a lightweight web framework. Setting up a separate app is unnecessary (except for end-to-end testing) — you can do it programmatically in many languages. Here’s the general recipe:

  1. Arrange: Launch the fake (before all or before each) and instantiate the test subject with the fake’s connection details. Some fakes, like databases, may require a reset per test.
  2. Act: Run the behavior you want to test. The caller talks to the fake through an adapter, “believing” it’s the third party. This will impact the fake’s state.
  3. Assert: If possible, rely on your app’s queries to assert the commands — dogfooding. Under the hood, these queries will use the fake state to reply. No dogfooding is possible if the subject has a dependency, but that dependency is write-only. This is usually an API you only write to (e.g., pay, send email, post order, notify). In these cases, we need a fake that offers a query to help us assert (e.g., to know if a payment, email, order, or notification was submitted).
Sometimes, indirect observation is our only option. Use it only if the subject can’t be observed directly. In that case, why is a fake still preferable to a mock? With a fake, there’s no coupling to the internal calls (e.g., arguments, etc) made to the external dependency.

Conclusion

The typical mocking approach (with or without a mocking library) replaces the adapter. An HTTP spy is better, but ideally, we should set up a test double outside (as a mock or fake) since we want to observe commands at the last moment.

Make sure you understand what a unit test is. Stop trying to please (fake) definitions and instead focus on the automated testing goals (i.e., safety net, supporting refactoring, documentation, etc.). The behavior is the ideal unit of an app because it’s what the app is meant for. Notice that we left the production’s code intact when we stubbed/mocked or faked the dependency.

Be economical in tests because every test has a price. Many queries don’t need to be directly tested because they’re part of other tests’ assert phases. Many commands don’t need to be directly tested because they’re part of other tests’ arrange phases (there’s no point in implementing a command if it can’t be observed).

We covered the entire behavior/command span by observing its effect at the system’s boundaries. You may call it an integration test; I call it the sleep-well-at-night test. You may argue that there is too much code to target per test, but that concern should encourage you to organize the (implementation and test) codebase by use case.

Mocking/spying is not necessarily bad as long as we’re careful not to leave out essential parts of the app, as this can result in very few tests that exercise the entire code related to each behavior; mocks and spies are okay if they’re done from outside the app’s boundaries. Still, the best option is to rely on fakes, which enable black-box testing through dogfooding (i.e., state verification) due to their statefulness.

Using fakes, we can make assertions based on the final states of the respective systems rather than relying on complicated spying. […] With mocks/stubs, we’d have to set up each dependency to handle certain scenarios, return certain data, etc. This brings behaviour and implementation detail into your tests, weakening the benefits of encapsulation. Learn Go with tests

Learn more

--

--

Luís Soares
CodeX
Writer for

I write about automated testing, Lean, TDD, CI/CD, trunk-based dev., user-centric dev, domain-centric arch, ...