Testing is Good. Pyramids are Bad. Ice Cream Cones are the Worst

Ice cream cones are the worst. Not all ice cream cones. Just the ones that model a testing approach for software development. Several different figures in the pantheon of leaders within the software engineering community, most notably Martin Fowler, have pointed to this anti-pattern as they explain a more sane approach for sustainable, high quality set of development practices.

Fowler and others within the engineering, QA and DevOps communities model a testing pyramid as the alternative approach to the clearly horrible ice cream cone. In working with teams, I’ve come to a new conclusion; pyramids suck too (just not as bad as ice cream cones).

Why Ice Cream Cones Are Bad

Back in 2011 when the savants at ThoughtWorks were codifying continuous delivery a number of obsessive engineers developed what has become dogma inside QA communities; the Testing Pyramid.

Along with the testing pyramid, came its anti-pattern; the Testing Ice Cream Cone. The ice cream cone is what comes out of a shop that places the majority of its QA effort and labor into manual test definition and execution. Over time, as the feature set supported by the software grows, the amount of required testing labor grows linearly, if not exponentially, right along with features (depending upon how features interact with each other, a new feature might have to be tested in the context of other features).

Testing Pyramid vs Testing Ice Cream Cone

Unless organizations are willing to either provide the amount of labor required to continue the manual testing approach, the enterprise will have to change approaches or depend upon providence and good fortune to avoid bugs and outages.

Clearly, this ice cream sucks, despite that it makes everyone want to scream. Product screams about long lead times, low quality and unpredictable results. Engineering screams about QA not catching defects. QA screams about not having enough resources to do the job well. We all do indeed scream about ice cream.

Why Pyramids Aren’t Much Better

The pyramid promises a different result. Shops that follow this pattern, place the majority of its QA effort and labor into automated test definition and creation. Over time, the body of tests grow and while costs do increase to maintain the suite of tests, the rate of increase becomes more palatable given that both quality and speed can be preserved (because test execution stays at the level of minutes or hours depending upon how you parallelize the execution across testing infrastructure).

Sounds great, doesn’t it? What could possibly go wrong? Here’s the deal; Practitioners are humans. These humans very often see the pyramid and apply it like a physical world concept — first construct the bottom layer of unit tests, then construct the next layer up and so on until you get to the top. I.E., “don’t do anything in layer two until later one is complete”.

While this application could conceivably work for a new code base if it was followed from inception, it could never hope to be effective when applied to a legacy application with a large code base and a team already running after the ice cream truck. Almost, if not, the entire team of engineers and QA staff would have to be dedicated to unit tests before any integration tests could be written, much less written and optimized for testing that doesn’t fully align with the feature-based functional paths. For example:

  • negative tests to find bugs off of the happy path main use case
  • security tests to find and eliminate security holes, ensure an appropriately secure application and prevent an unauthorized data breach
  • performance tests to find bottlenecks that kill user engagement and create possible outages

There may be less screaming in this mode, but the sounds coming out of this model surely aren’t cries of joy.

You Can Catch More Bugs With Honeycombs

So, if ice cream cones always suck, and pyramids suck for transforming teams, what’s a responsible development staff to do? Maybe it’s time to take a page out of the UX playbook and apply Peter Morville’s classic honeycomb model used to represent the facets of user experience. Each facet is required to achieve the end goal of a well designed experience, but no facet supersedes the others in terms of importance or precedence.

Peter Morville’s UX Honeycomb

The downside of the honeycomb model is that you lose what is actually the intended message of the pyramid; unit tests should exist in the greatest number, there should be fewer integration tests than unit tests and so on up the stack and the manual exploratory tests should be the fewest in number not only because the variable cost to execute is highest, but more importantly because they provide the least value in the process of isolating the problems in the code base.

With all of the above context in mind, I’ve drafted (with contributions from Chue Her at Cox Automotive) what could be a model that helps in-transformation teams understand the importance of a holistic approach to testing that:

  • gives more context to the different types of tests in software/product engineering
  • does not pit one type of test vs another in a linear-adversarial model
  • communicates the guidance on the relative levels of magnitude for each category of test
  • includes the concept of value-driven testing to allow people to orient their transformation efforts on high value areas
The Testing Honeycomb

What remains to be seen is if the community at large sees any attempt to refine current dogma as vinegar (even if it is shaped like a honeycomb).