If you’re on a development, especially a web development team, chances are you’ve seen something that looks like this:
This is an amalgamation of Martin Fowler’s and Google’s testing pyramids put together by Kent C. Dodds (love that guy). These testing pyramids are meant to outline the differences between unit, integration and end-to-end (E2E) tests and teach where your priorities should be between them.
But these testing pyramids are leading us astray! In this article I’ll tackle why, echoing Kent C. Dodds’ article on the matter and taking it a bit further. There’s a better way — so put on your heretic hat and let’s find out the truth.
Before we can get too deep, we need to align ourselves on the definitions of some testing terminology. It’s important for teams to get these straight so that they can speak the same language among themselves, and also so that when seeking out educational resources, teams can speak the same language as the wider community.
1. Static testing
Static testing is the idea that you can “test” your code without even running it just by simply reading it. This is not often thought to be in the same category as other tests, but with the advent of linters like ESLint and other similar tools, developers (especially front-end web developers) truly are able to identify and crush bugs before anything is even run.
2. Unit testing
At its core, it’s defined as testing a unit of code in isolation from others. This is tricky because “unit” is vague and “isolation” is vague as well. How much of the code do you have to mock before it’s considered a unit? Devs at Twitter avoid the term “unit test” as much possible because of this.
But for now, let’s consider unit tests to be what comes to mind: the ones that test a function and more-or-less nothing else, or code that has everything beside itself mocked out.
3. Integration testing
Integration testing refers to testing the behaviour between multiple units. This is slightly vague as well, because that could mean anything from a single React component using another one all the way to AWS SQS objects being properly populated in response to AWS SNS messages.
That’s a very wide breadth of tests!
API test suites would be another example of integration tests of a web application, though it could definitely be argued that they’re end-to-end tests — spoiler alert, it doesn’t really matter too much in the end. It’s really important to note that the standard definition of integration testing does not specify whether or not require a full startup of an application in required: it can be either!
4. Functional testing
This one’s unfortunately probably the most misused term. Functional testing refers to testing anything against its functional requirements. This means that all unit tests, integration tests and end-to-end tests are all functional tests! Examples of non-functional tests would be security testing (eg. ZAP scans) and performance testing (eg. using JMeter).
5. End-to-end (E2E) testing
End-to-end testing has a pretty strict definition: these tests need to simulate a real user using the software as a real user would. In the context of a web application, end-to-end tests need to open a browser and click around the application just like a user would (eg. Selenium-based tests).
Michael Pollan, Guillermo Rauch, Kent C. Dodds
If you’ve never heard of Michael Pollan, I recommend that you remedy that as soon as you can. He’s a fantastic food author and just an all-around swell guy. When he’s asked for his thoughts on the ideal diet, he responds with simply:
Eat food, not too much, mostly plants.
I love many things about this. It’s an extremely streamlined answer: he’s cut out all the details that might confuse someone and leaves only what will really make a difference for people. He argues that people have been eating forever and have stayed generally healthy, and all without knowing the finer details of vitamins or carbs-fat-protein ratios.
Being a big fan of his, I was delighted to see that Guillermo Rauch, founder of ZEIT and creator of Socket.io, put out a tweet on testing, spinning off of Michael Pollan’s streamlined quote:
This line is equally full of nuggets of wisdom and is largely unpacked in Kent C.’s article on this tweet (though I don’t know if he realized that it’s inspired by Michael Pollan!). I’ll unpack and extend some of Kent C.’s thoughts here as well but for now let’s keep this in mind as we continue.
Why Test At All?
What is the purpose of having tests? It almost seems like an axiom of software development but we shouldn’t take its purpose for granted. At the end of the day, as a product team we want to be confident in our final product. There are many ways that different teams can add confidence to a product, and one of the biggest ways a software development team can add confidence is by adding automated tests.
Assertion 1: The purpose of tests is to give us confidence in our product.
Debunking The Pyramid
Let’s take another look at the pyramid.
One of the worst things about this pyramid is that it’s actually technically true. Yes, as you go down the pyramid, tests are faster, and yes, as you go down the pyramid, they’re “cheaper” to run — and by “cheaper” we mean cost of writing and cost of maintenance, and in some cases, the actual dollar cost of running them on an AWS EC2 instance, for example.
This diagram also has a very strong implication that further down the pyramid is better, and that there should always be more unit tests than integration, and more integration than E2E.
But we’re becoming heretics, remember. We’re going to find out what the real truth is!
Outcomes Of The Pyramid
There is one particular statement that I’ve heard perpetuated as a result of this testing pyramid scheme:
“If you can write a unit test instead of an end-to-end test, write a unit test. Otherwise, write an end-to-end test.”
I think it’s fair to hear this said as a natural outcome of using the pyramid. There’s value and danger in this statement:
- Danger: it bears the implication that unit tests can give you the same confidence that end-to-end tests can
- Value: it stops developers from writing end-to-end tests when they should’ve written a unit or integration test
But writing E2E tests instead of unit tests is not a real issue.
I’m convinced that the vast majority of software developers do not suffer from this problem. When was the last time you wrote a full end-to-end test just to test if a button received a property correctly? Due to how much more effort is involved writing something more than a unit test, I really do believe that this is a non-issue.
That unfortunately leaves us with just the danger without any benefit. Subscribing to this model results in a tendency to go into a code review and if there are no end-to-end tests, but there are unit tests, we give it a thumbs up and it’s merged.
Let’s pretend for a second I work at Google on the Google Drive team. Let’s pretend further that I’m working on the “New folder” functionality:
Let’s say that these are all React components: the dropdown, each menu item, and the dialog that comes up when you click “New folder”, and that I have unit tests for all of them, even one for checking that the “New folder” button brings up a dialog.
Let’s suppose my manager approaches me and asks me a hard-hitting question:
“Are you confident that this feature works on production right now?”
I can say that I am fully confident that the components are perfect, and even that the flow works on its own.
But I can’t say I’m 100% confident that it’s actually working in front of users. What if another developer has gone in and accidentally mixed up some prop names in a refactor? What if another developer changed up how the API calls are handled without updating this spot? What if the “My Drive” button doesn’t even show up in some cases? These things could be caught in unit and integration tests, but there’s no guarantee. E2E tests are required in this scenario to achieve high confidence.
This may seem obvious, but it’s common enough in real development teams that we need to intentionally fix this perception of unit tests.
Assertion 2: We almost never have a problem writing integration/E2E tests instead of unit tests — we have a problem thinking that unit tests can give the same confidence that integrations/E2E tests can give.
Pyramid Out, Trophy In
So we’ve come to the conclusion that the testing pyramid is missing the very essential dimension of confidence, and also leads us to warped perceptions about the importance of unit tests vs other tests. So let’s throw out the testing pyramid and consider the testing trophy:
This diagram and the idea of the testing trophy as a model that includes confidence as a dimension for evaluating what kind of tests to write is all Kent C.’s creation.
This new shape asserts that the best trade-off between speed and cost is to go for integration tests — not too slow, not too expensive, but gives high confidence. An awesome example of this is again an API test suite. They usually run quite speedily but touch such a wide breadth of code. This diagram also emphasizes static tests as linting becomes more powerful every day. Unit and end-to-end tests are still important but effort should be focused on integration tests.
But again, even though this is a huge improvement in that now we are recognizing confidence in our priorities, I can’t help but feel like this actually does wind us back in a similar situation, but instead of emphasizing unit tests, we’re now emphasizing integration tests.
Consider the following two examples:
This is a new super cool math library that I just published on npm. It has a function called addButInReverse which takes in two numbers and adds them, but backwards (very high-level mathematics). There’s nothing else I need to add in addition to these unit tests: I have full confidence in this library.
Continuing on the Google Drive example, this is a unit test that tests that file upload works correctly. I can’t be confident that this feature works as a result of this unit test passing alone. I need the integration service tests, API tests and E2E tests to know for certain.
So we can see that in one case, unit tests alone were sufficient and there was no need for integration tests unlike what Kent C.’s trophy model suggests. Here however, I need top to bottom tests, also unlike what Kent C.’s trophy model suggests. So we can say that the trophy model as is can’t quite be the catch-all general model that teams can use for all types of projects.
Assertion 3: Every task should be shipped with the tests that result in high confidence — what type they are is unimportant
The Confidence Trophy
I’d like to present a tweaked version of the testing trophy: the “confidence trophy”. This view of tests is almost completely agnostic to the type of test because the idea is based on:
- We need the tests that make us confident in the feature, whatever they are
- Our earlier assertion that you’ll never write a test that’s more expensive than you need it to be
It requires that we be honest with ourselves when we ask ourselves, “Do my tests make me confident in my work?”
At the top I have replaced E2E with “Overly Confident”. In Kent C.’s model, E2E was expressing the need for tests that go the extra mile to verify behaviour. I would say that E2E tests aren’t necessarily going the extra mile, but rather the exact mile that you need in order to be confident.
Instead I’d say that the extra mile would be expressed by more permutations than usual due to how important those areas are. For Google Drive’s team, where they have mostly sanity E2E tests for their flows, they might try to be overly confident that file uploads work as expected and test many edge cases in their E2E tests, even though that sort of testing is traditionally better suited to unit or integration tests.
I’ve replaced integration tests with “High Confidence”. This is to address the fact that integration tests are not necessarily the only tests that give high confidence, but that actually a handful of different types of tests may be required to be fully confident in any given feature. These could be unit, integration, end-to-end, or even non-functional tests, like performance tests.
I kept unit the same because almost all code can and should be unit tested. Some tasks need more than unit tests, some don’t, but almost all need unit tests. I also kept static the same because I like that linting etc. are being considered part of the testing stack: there are very powerful ESLint rules out there!
I want to highlight one way that the testing pyramid bleeds into this model. If you write integration tests, you probably should have unit tests. If you have end-to-end tests, you probably should have integration tests and unit tests. The entire collection of them is what brings high confidence.
If I were to pen my own spun-off version of Pollan’s and Rauch’s quote that embraces this model, it’d be:
Write tests, not just unit, mostly ones that give high confidence.
Definitely a little wordy but get’s the job done!
Every model is not without its shortcomings. This model requires a development team to know what tests are required for high confidence and to be honest with themselves about it, even if the tests aren’t super fun to write. Those are definitely very, very real challenges!
This model requires a high degree of professional and technical maturity, as well as strong leadership on the team regarding what tests give high confidence for each project.
To give some sort of starting point for direction, I would say that for any project, the tests that ultimately give high confidence are ones that use the project in the way it will be used outside of the development team. For a component library, that would be mostly integration tests. For a web application, that would be end-to-end tests.
Can’t stress enough that this idea is an extension of Kent C. Dodds’ article and uses his trophy diagram. Well worth the read, as well as the rest of his content!
- The purpose of tests is to give us confidence in our product
- We almost never have a problem writing integration/E2E tests instead of unit tests — we have a problem thinking that unit tests can give the same confidence that integrations/E2E tests can give
- Every task should be shipped with the tests that result in high confidence; what type of test they are is unimportant
Further, if you are considering using the confidence trophy model, keep the following in mind:
- Be honest with yourself: do you have high confidence that your work functions correctly in front of users given the tests you’ve written?
- No need to go overboard everywhere, but it’s worth being overly confident in critical areas