Balancing the Test Automation Pyramid
The test pyramid is a model first put into print by Mike Cohn in his book Succeeding with Agile, and further popularised by a blog post on Martin Fowler’s site. It is meant to be a visual representation of how to most effectively structure your automated tests for a project. The pyramid shown in Cohn’s model has three layers — Unit tests, Service Tests and UI Tests. The general idea that the pyramid is trying to get across is that your testing strategy should be built on a strong foundation of Unit tests, as these are generally quick, deterministic and relatively simple. This foundation should support a smaller layer of service tests, which tend to give slightly slower feedback and as they involve multiple actors can be more prone to reliability issues. And finally, the pyramid should be topped by a carefully curated set of UI tests, as these tests involve the whole stack, can be prone to frequent non-determinism, and take a comparatively long time to run.
One of the underpinning ideas behind the model harks back to one of the first things you learn as a tester — bugs discovered early in the development cycle can get fixed quicker and as a result cost less money. If we can catch an issue in the unit test layer, we catch it during the build process, we get very clear feedback about the failure and we fix it before we even deploy the service anywhere. If we instead catch the issue in our UI test layer, depending on how your pipeline is organised you may have deployed the service(s) to one or more environments by that point. In addition, the feedback you get from UI tests tends to be less specific and issues found here demand more detailed investigation, which is more costly.
One of the core questions that needs to be asked when attempting to apply the test pyramid to a project is — for a given test case, what level of automation is appropriate? Cohn puts forward the example of a calculator application in Succeeding with Agile — given a service which provides multiplication for a given pair of numbers, we can conduct the majority of the tests, including boundary test cases, at the service level to prove the multiplication function itself is acting as expected. This means that at the UI level we just need to verify that the controls on the frontend are hooked up correctly to the underlying service, and that the inputs/outputs appear in the expected places.
This is a simple example, and for most applications the divide will be a bit more nuanced, but the overriding principle is that we should be pushing tests down to lower levels as much as possible without compromising the purpose of the test itself.
The Test Pyramid is, conceptually, a simple model and can be found as the cornerstone of many a test automation strategy. More recently the model has faced criticism among the testing community that it is outdated, and there has been considerable discussion about potential issues with the pyramid model and possible evolutions (from band-pass filters to honeycombs). However, despite this the pyramid is still a useful heuristic to guide test automation for a project, as long as it is applied with caution. Having seen a number of teams work towards their own interpretation of the model, there are a number of common pitfalls that are worth being mindful of.
This is a relatively well-known anti-pattern where a team or organisation has put the majority of their effort into automating at the UI layer and has largely neglected the lower layers. This results in an inverted pyramid shape — an ‘ice cream cone’. Ultimately this leads to long feedback loops for the team and large amounts of time spent maintaining unreliable tests. This will, over time, reduce confidence in the tests and by extension the testing effort.
This is another anti-pattern whereupon there are a large number of UI-level tests, normally maintained by the testers, and a large number of unit tests, normally maintained by the development team, but little or no tests at any layer in between. This isn’t as cumbersome as the ice cream cone model — indeed there is a net of unit tests which will potentially catch issues early and a suite of UI-level tests which traverse common user journeys. However there is a missed opportunity in that some (most likely a lot) of the UI-level tests could be pushed further down to the middle layer, meaning faster feedback and less maintenance cost.
The ‘Test Team’ Pyramid
As Testers or Software Engineers in Test (SETs), we’re most likely to be doing the majority of our daily work in the upper echelons of the pyramid — working on end-to-end UI tests and service tests. However the lower levels shouldn’t be forgotten — the developers are (hopefully) writing unit tests and integration tests which will form the foundations of the pyramid. It’s always worth talking to the developers implementing the feature about what coverage they have included, and — if possible — reading the tests they have written. This information can then be used to inform testing at the higher levels.
If a team attempts to adopt a pyramid-shaped test strategy but only considers the testing efforts coordinated by the testing team then there will likely be significant duplication and the tests in the various layers are unlikely to complement each other. Just as with any test effort, the pyramid needs buy-in and coordination from the wider team.
Don’t Repeat Yourself Unless It’s Worth It
Talking to developers and understanding the tests they have written is essential to help us plan how we are going to test at the upper levels, and can avoid us duplicating effort. However, there are cases where you might want to repeat the same test at different levels — for example we might have an authentication test at the service layer which verifies you cannot login with incorrect credentials, and that the correct HTTP status code is returned. We might then perform the same action at the UI level in order to verify the correct error message is shown to the user. Thoughtful use of similar tests at different levels can complement each other, even if for the most part we are looking to avoid duplication.
Over-Adherence to the Model
The original test pyramid has 3 layers, however as with many things in testing, context is king. Depending on your project, your pyramid may have many more levels, and teams shouldn’t get hung up on the numbers of layers or terminology used in the model. Each team and organisation uses terms such as ‘service test’ or ‘integration test’ in slightly different ways, but as long as everyone understands the terms in context and we are using the basic principles to guide our test architecture, the terminology is relatively unimportant.
Similarly teams should not mandate numbers (relative or otherwise) of tests at each level. Mandating numbers in testing is a general anti-pattern, as it can often lead to unintended and potentially counterintuitive behaviour as team members focus on the proportions or numbers rather than the value they bring. Here specifically the numbers of tests aren’t as important as how quickly they can give feedback to the team about the state of the product, and the relative maintenance cost of the tests at a given level.
No Silver Bullets
The Test Pyramid should be used as part of an approach to test automation, which itself should be part of a wider test strategy. It will not obsolete manual testing, nor would it be desirable to do so. Automated testing and manual testing are not, as much as it may seem, contrasting approaches. Instead they should complement each other in order to improve quality. Automation, when diligently applied, can free manual effort to perform more exploratory-style testing and find more profound issues which it would have been difficult, if not impossible to find using automated tests. Whilst there are many teams who profess to have obsoleted manual testing in favour of automation, and who continue to deliver fine software, it is hard to imagine that the introduction of a talented manual tester would not improve quality further.
The Test Pyramid is a model that has endured for some time. Although it has its fair share of criticism, I believe it still offers value to teams new to automated testing or as a useful high level heuristic for planning automation strategies. That said, as always, although there are good practices in context, there are no best practices, and care must be taken to adapt the model to the context of your team and project to get the most out of it — blind adherence will pay no dividends. As with many techniques, as the team matures it will become less necessary to refer to the model and teams may move on to more complex models or in a different direction altogether.