The Practical Test Pyramid

Denis Brandi
Tide Engineering Team
7 min readJun 30, 2020

Why do we write tests?

Every company knows this absolute truth: bugs cost money.

Nasa’s Mars Climate Orbiter bug: $125 million

Ariane 5 Flight 501 bug: $500 million

EDS Child Support System bug: + $ 1 billion

Japanese bitcoin exchange, Mt. Got, bug: + $ 500 million

This truth creates the need for testing before going live.
Startups and small-sized companies heavily rely on Manual Testing and their focus is on delivering the minimum viable product.
Code quality is not a priority, which is understandable considering that there are limited funds and resources.

However, this comes with a cost.

The Butterfly Effect

Photo by Jimmy Chan from Pexels

After the initial MVP, subsequent releases are built on top of clunky and untestable code, manual testers have more and more journeys to test and even integrating small changes can create major issues.
These issues are discovered only at a later stage and fixing it and retesting it almost doubles the initial development effort (and that’s after only the second sprint, it gets worse over time).
The team has now started suffering the butterfly effect.

Manual testing requires a lot of time and so does writing new code: it’s time to speed up and automate the process…

The team can’t yet write isolated tests because the code is untestable (god objects, no dependency injection, unmockable asynchronous logic) and so it has to start from the tests that require you to only mock UI interactions.
If some parts of the codebase have been refactored over and over again, at each release, most probably they are now in a better state so some integration or even unit testing may also be attempted.

The Test Ice-Cream Cone (Anti-Pattern)

The team has automated tests and clearly, things are going better than before, however, this testing approach is still very expensive.

UI tests take a lot of time to write, they are slow, flaky (they return false negatives) and they require the whole system to be up and running: servers, databases, dependencies, frameworks… plus all the test data you might possibly need.
They are also not telling where the issues are, in fact, you might not see any reference to your code at all.
And they also don’t give you enough coverage: it is nearly impossible to cover all the scenarios because of all the combinations of the separate units involved.

Integrated tests appear in a smaller percentage, they run faster and they require less setup, they are also not flaky. Still, a failing test will not tell where the issue is, and covering all the scenarios is still hard and expensive.

Unit tests, which are almost not present, run in a few milliseconds, they require almost no setup and any failure points to the specific line where the error is.

At this stage, most of the tests are slow, non-deterministic, and expensive to write and maintain.

The team now realizes that the distribution of the tests is wrong and it is better to “focus” on the cheaper tests and improve the code quality.

From Cone to Pyramid

Icons by svgrepo.com

In 2003, Mike Cohn and Lisa Crispin came up with the Test Automation Pyramid (described in Cohn’s 2009 book Succeeding with Agile): this was a metaphor used to explain how to group different kinds of tests and how many tests of each group we should have.
The pyramid clearly states 3 levels of tests: small, medium, and large (respectively bottom, middle, and top).

Unit Tests should be the primary focus, they are cheap, fast, and point exactly to the lines of code where the bug is.
These tests are used for testing the logic in the code does what the programmer thought it would.
Having a wide set of Small Tests means that code is highly testable, hence it is also clean and flexible.

Service Tests are not as productive as Unit Tests, in fact, they are meant to be used only for checking that all the units from the service layer to the infrastructure layer work well together and that external dependencies are integrated correctly.

UI Tests (or End-to-End tests, which are the same thing for Mike Cohn) are the least productive ones: it takes an order of magnitude longer to run them compared to other tests and they are also more fragile since even a small UI change can break them.
For these reasons, they should be used more sparingly, only for the important end to end style smoke tests.

Although this pyramid gives a clear representation of levels and distribution, it is not very clear what Service Tests really mean.
This middle layer is not clear to many and Mike Cohn himself made the observation that it is usually ignored, which makes developers jump to the top level even when unnecessary.

Also, what happened to Manual Testing? Can we really be confident without it?

The “Practical” Testing Pyramid

Alister Scott improved the initial Cohn’s Pyramid and put more emphasis on the medium level tests and the manual exploratory tests.
We now have 3 levels of the pyramid dedicated to Automated Tests, and the last level to Manual tests.

Service Tests

The level that Cohn was calling Service Tests is now a group that consists of the following tests:

  • Component Tests: these are meant to be a cheaper version of E2E tests where instead of starting from the user interactions they start from the service layer and they go down to the infrastructure layer (database, network services…).
    In modern layered architectures, the service layer would be represented by Application Services/Use Case Interactors (in Clean, Onion, Hexagonal architectures).
    In case you are familiar with Presentation-Domain-Data-Layering, the name Component is used to represent the Domain-Data block (where Application is part of Domain).
Component Testing
  • API Tests or Contract Tests: the integration with external services cannot be tested through Integration Tests, it would return unpredictable results, and it would be impossible to set up a test for verifying a specific condition such as failures or update operations.
    For these reasons in Contract Testing specific mocking libraries are used, such as Pact. These “special mocks” generate one contract file (pact file) for the “Consumer” (client) and one for the “Provider” (server) which then are matched through a Broker.
    A failing match means that either the Consumer or the Provider at some point broke the contract.

Manual Exploratory Tests

The second improvement of Scott’s Pyramid is that it is now clear that manual testing is still needed.
It doesn’t matter how diligent your effort in automation testing can be, you can’t be 100% sure that everything is covered by automation and not everything can be automated, like design and usability.
For these reasons, you should add Manual Exploratory Testing on top of your automated tests.

With Exploratory Testing, you try to find a way to break your application and you also record all the lags, slow responses, design issues, and non-user-friendly behaviors.
If you find a bug, check if you can create an automated test for it, if you can it means something went wrong during development so try to keep it in mind and avoid making the same mistake in the future.

Conclusion

Testing terminology has been more and more confusing over the years so don’t become too attached to names, just make sure your team has a common vocabulary.

For new implementations start from the bottom of the pyramid and focus on the design of the code, those are the tests that give you coverage and are easier to implement, the higher up in the pyramid the more knowledge about testing is required.
For legacy code instead, make sure you have some wider tests that give you enough confidence before starting the refactoring.

Thanks for reading!

--

--