On Untestable Software

Your testing problem is really a testability problem

Blake Norrish
Slalom Build
16 min readApr 21, 2022

--

Here is a statement I strongly believe to be true:

There can be multiple orders of magnitude of difference in the difficulty of validating software systems of equivalent complexity.

This statement is not meant as hyperbole. While there is no way to precisely measure validation difficulty—and comparing the relative complexity of two unrelated systems is highly subjective—I still hold that the overall difference in effort can be multiple orders of magnitude. If it takes one hour to validate in one system, it could take over one hundred hours in another. This belief comes from observing and participating in the development and testing of hundreds of systems across dozens of companies over the last 22 years.

How could this possibly be? Yes, software can be poorly designed and poorly implemented, but surely this would result in only a 50% increase in difficulty, or 100% at most, no? How is it possible to implement software so poorly that it leads to a multiple orders of magnitude difference in the difficulty?

Having dropped the hook of the article, I wish I could introduce the Three Design Patterns Guaranteed to Make Your Systems More Testable! and head right into an explanation of each. Unfortunately, our problem is not that simple. If creating testable software systems was as easy as using a design pattern, we’d all already be enjoying highly testable software. And I wouldn’t need to write this article.

Instead, I’m going to try to convince you that this massive difference in difficulty actually exists and why it matters. I’m then going to talk about how quality assurance, the whole discipline of how we test software, is insufficient and can actually make the problem worse. This will lead to a simple but different view on quality that naturally leads to more testable systems and more predictable, effective software delivery.

The Smells of Untestable Software

In order to convince you of this massive difference in testability, I could describe in detail two software systems on opposite ends of the testability scale. I would have to describe the overall architecture, layers of system design, the implementation of every component within that system, the tech stack used, all the interactions across all the boundaries with other systems, the data model, the persistence strategy, and probably a lot more as well. After this, I would have to go into detail on the validation strategies that could be used on each, how test data management would work, how and where automation would be written for different types of tests, and where it would be could be run in a CI pipeline. I could then show how nuances of the architecture, design, implementation, and tech stack decisions come together to create a frustrating (or straightforward) validation experience. Seeing these two equivalent systems side by side—one horrendously untestable, one delightfully testable—would hopefully convince you of the order of magnitude difference in testability.

Each of these descriptions could easily be dozens of dense pages, and unfortunately (or fortunately) this is a blog article, not a book.

Thus, I have to go with a more condensed strategy: I’m going to describe a variety of symptoms of hard-to-test software, the frustrations and challenges that you too may have dealt with, and hope that stepping back to consider them in aggregate convinces you that some software just might be significantly more challenging to test. I hope that reflecting on these symptoms and your experience with them, I can persuade you that they are not natural challenges that everyone deals with as part of normal software delivery, but pernicious smells indicating something is horribly wrong. That you were dealing with innately untestable software.

All of these are real examples from real teams building real software. In no particular order, here are some infamous symptoms of hard-to-test software:

Test Data Setup and Maintenance Nightmares: This is when setting up, cleaning, and otherwise managing test data is inordinately complex and consumes a significant amount of overall validation time. Test data is a huge category that includes every piece of state a test depends on: specific rows and values in a database, specific users set up with specific permissions, specific items with specific attributes in stock, configuration values, etc. Managing all this test data is a big part of all testing, but deterministically and efficiently controlling the data can be hugely challenging or outright impossible in hard to test systems.

Unfortunately, in even otherwise well-architected systems, it can be incredibly challenging for testers to manage system state and test data as needed for testing. For whatever reason, the system resists this type of very necessary manipulation.

How many times have you spent hours, or even days, trying to set up data to reproduce just one gnarly bug, or force one rare but critical edge case? Taking days to test one thing in one system, when that same test could have taken hours in another system is an example of the order-of-magnitude difference I am talking about. Frustratingly hard data management is just one cause of this, and is a common symptom of hard-to-test systems.

Validation by Side Effect: This symptom occurs when tests cannot directly determine their outcome, but must rely on less trustworthy, less explicit side effects of what they are trying to validate. For example, if the system is in state ABC and we perform actions XYZ, then it should lead to result R. Unfortunately, we cannot directly observe R, so we have to look for side effects of R, hints that R occurred, but not actually R itself.

Validation by side effect is often found in systems with poor observability. However, just because a system is architecturally observable does not mean it is observable from a validation point of view, and won’t suffer a validation-by-side-effect problem. For example, many systems are composed of components that are individually highly observable, but tests need to correlate events between these components, and there is no mechanism to do this. While the individual components are observable, the system as a whole is not, and this severely impacts testability.

A much simpler but more common example of validation by side effect is when a test must scrape system logs to read the state of a system. Testable systems do not require tests to scrape log files to validate behavior, and can make something simple and dependable in one system horrendously tedious and flaky in another.

The Need for Artificial Test Surfaces: This occurs when custom functionality must be introduced into production code solely to support testing. For example, service endpoints, data injection/collection mechanisms, or configuration options. This code is artificial in that the only reason it exists is to support testing; it is only necessary because the actual system is so challenging to test. While adding small test hooks is not necessarily always bad, being forced to develop significant artificial test surfaces just to support testing is a symptom of hard-to-test software.

A Misshapen Test Pyramid: When the test suite is forced by the nature of the system to overly rely on one type of test for most validation. A common form of this problem is a disproportionate amount of UI-driven E2E tests relative to other smaller, subcutaneous tests, usually called an “inverted test pyramid.”

This is not because automators want E2E UI tests, but because something about the system being tested is forcing validation through the top-level UI. It could be poor API design, proprietary transport protocols, poor class design, overly complex data models, or dozens of other reasons. The effect is the same. Like looking through a keyhole into a large room, the tests can only access the system through one small lens; they are blocked by the nature of the system from more appropriate, more effective interaction points.

Having to deploy, set up, and ensure the validity of a large (possibly shared!) test environment to validate behavior—because all your tests are end-to-end tests, when that behavior actually only requires a small subset of the system—is wasteful and is a common symptom of hard to test software.

Test Environment Scarcity / Resource Contention: This occurs when the underlying architecture or other aspect of the system makes it challenging or impossible to replicate that system into arbitrary test environments or deploy new environments quickly.

For example, many systems might contain instances of enterprise ERP or CRM software from vendors like SAP or Oracle. These components represent core functionality of the overall system, but are notorious for limiting instances (due to licensing or custom hardware requirements), and thus are forcing a multi-tenant approach between environments (IE: sharing instances of the ERP system across every test environment). While we might not be able to remove the reliance on these types of third-party components, anything that prevents the cheap and quick deployment of a new environment decreases testability.

In addition, sharing component instances between environments introduces all sorts of new test data management challenges, which is another negative impact on testability.

Time Dependencies: This is when tests or automation are somehow tied to real-world clock times within a system. For example, think of a system that batch processes some data at specific times of the day, and that processing event cannot be triggered artificially, or if triggered, would have unwanted impacts on other areas of the system. Thus, any test of this processing functionality has to ensure it is synchronized with real-world time

This might seem like a contrived example and obviously-bad architecture, but it is quite common for large, enterprise software systems to interact with non-automated processes, or processes that are otherwise tied to real-world time, and these interactions significantly impede testability.

I hope those of you with experience validating modern software are nodding in agreement right now, thinking of all the times you have found yourself banging your head into the proverbial wall, trying to set up or validate some piece of the system and thinking this should be easy! Or found yourself spending hours and hours debugging a test failure only to rerun the test, have it pass, and have no idea what changed. You understand firsthand the challenges of hard-to-test software.

Many quality processionals will accept these situations simply because they are so common. They assume it is necessary, or at least normal, to have to deal with unmanageable test data, to work with limited or shared test environments, to validate using side effects, or to require custom test hooks. They have only ever experienced hard-to-test software, so it has become expected, just something to accept and endure. They have never had the pleasure to work on truly testable systems, and so have nothing to compare it to.

However, the root cause of these symptoms is not the tester or the test approach; it is not because they used the wrong tool or they weren’t smart enough. The root cause was that the system they were testing was hard to test. They were given a nasty problem, and were forced to deal with it. They assumed it was their job, and they did the best they could.

These are a few of the probably hundreds of symptoms of hard-to-test architectures. Each one might seem like a pain, for a challenge, but in aggregate and in combination with the many other ways software can be hard to test, they can create an order of magnitude difference. Hard to test software is truly a nightmare!

Why Testability Matters

Let’s remind ourselves why this problem even matters.

If software delivery could be divided up into 999 parts development, and one part validation, this problem would not matter. Even in the extreme case that 1 part validation grew into a 20 because of the untestable nature of the system, it would still be dwarfed by the cost to develop the software in the first place. We would be much better off spending our energy trying to optimize the 999 parts of development, and would be fully justified in ignoring the 1 (or 20) part of validation. It just doesn't represent a significant proportion of the overall problem.

I hope I do not have to convince you that the 999–1 ratio is wrong. While splitting software creation into “development” and “validation” is incredibly simplistic, and a million variables impact the result, I would argue that the correct ratio between effort-to-build and effort-to-validate is probably closer to 1:1. However, the exact value doesn’t really matter. All that matters is that you agree that some non-trivial portion of the overall effort to deliver software is validation.

Thus, it should be obvious that a multiple order of magnitude difference in the complexity of that non-trivial validation effort is… problematic. If development/validation was 50/50 and the cost of the validation half blew up to 500, you effectively can no longer deliver software.

An increase in difficulty of that size would mean that all efforts to efficiently deliver software could be undone by the massive effort to validate it. It would mean that all the energy expended to ensure teams are productive could be meaningless given the enormously negative velocity impact of validation. It would mean that if we do not control the testability of our software, all deadlines would be guesses and our estimation techniques utterly useless.

If you are in any way responsible for delivering software, this should terrify you. The testability of your software could be your undoing. This is why the testability of software matters.

If we as an industry did not address the huge impact of the validation problem on software delivery, instead just swept it under the rug or dumped it onto an unwitting group of people with “quality” in their title, we will have failed in our mission to efficiently, predictably deliver high quality software. It is not enough to be great at building software, we must be great at building testable software.

Quality Assurance is Insufficient

Quality Assurance is the discipline of how to test software. It is a deep, broad, and evolving field. It is also incredibly challenging, requiring analytical skills, critical thinking, problem solving, creativity, and strong engineering skills. Plus a whole gamut of soft skills: communication, influence, conflict resolution, etc. I have nothing but respect for quality assurance professionals.

Unfortunately, quality assurance, as traditionally practiced, is insufficient in dealing with the validation challenges of modern software. Why? Because quality assurance defines the scope of its problem as starting with the software to be tested. It says: given this software, how can we most efficiently and effectively test? How do we gain confidence that this software will have value to the customer? To quality assurance, software (whether in a spec, or implemented) is the input to the problem.

This is a myopic approach to thinking about software quality. It too narrowly focuses on given this software, how do I test it. With the huge difference in the testability of software we saw earlier, the important problem in software quality is actually how do I minimize the challenge of testing this software. If the software is hard-to-test, no quality assurance strategy—no matter how well thought out, insightful, or creative—will be effective. No amount of quality assurance can save you from an order-of-magnitude testability problem.

In other words, quality assurance can guide us in how to efficiently and effectively test a piece of software, but it is impotent when the software isn’t actually testable.

Some software quality professionals might push back on this, claiming that concepts like shift-left, continuous testing, and building-quality-in incorporate the same idea. Unfortunately, all of these focus on quality, not testability. They say things like: testing should happen early, or often, or all the time, so we can find defects early (even in the spec!) or avoid bugs to begin with. While they are all commendable strategies and are very much a part of healthy test approaches, none of them create testable software, none of them help avoid the order-of-magnitude validation problem to begin with.

Unfortunately, not only can quality assurance be insufficient for validating modern software, but sometimes the good intentions of QA individuals, their nature as problem solvers, actually makes the situation worse.

Quality assurance professionals are smart people who enjoy solving hard problems. Successfully solving easy problems is great, but does not win you the admiration of your peers nor give you that ‘I accomplished something awesome’ dopamine hit. Solving hard problems does. Put a hard validation or automation problem in front of a QA and most likely they will roll up their sleeves and dig into it. Thus, many quality teams, rather than push back on hard-to-test software, enthusiastically accept it. It represents a hard problem for them to show their value, to prove their mettle.

This tendency is actually quite common, but is a pathology that leads to ineffective software delivery. Quality assurance professionals making herculean efforts to validate software despite the un-testability of that software hides the root problem. It prevents the rest of the team from seeing and realizing that the software itself is wrong, that it was built ignorant of the challenges of testability.

I am not trying to belittle or disparage people in the field of quality assurance. This is an incredibly challenging and valuable role. I am, however, trying to call out that amazing quality assurance is not sufficient in modern software development. No quality assurance process, regardless of how optimized and streamlined, can overcome the orders-of-magnitude problem of hard-to-test software.

Testable Architecture and Quality Engineering

Given the huge challenge of testability, it is critical that we build software that is testable. We must develop and evolve the software’s validation strategy with the software, and leverage that strategy to guide architectural, design, and implementation decisions. The validation strategy is an input to software, not something created when software is finished, and testability should be considered a first order requirement of software design, as critical as any other.

Implementing software to be testable is a critical thread of quality engineering.

Quality Engineering is the active and intentional creation of testable systems. It is the understanding that validating a system can be such a complex, challenging, and time-consuming endeavor that any effort to avoid, mitigate, or lessen the problem is enormously valuable to overall effective and efficient software delivery. Quality Engineering takes all the expertise of quality assurance and its obsession with the problem of How do we validate software? ... and applies it to the problem of How do we build software to be more easily validated in the first place?

Unfortunately, it is common within industry to overuse the term engineer due to its high marketing value when recruiting, and many have done this with quality engineering.

In our definition, quality engineering is not the ex post facto application of test and automation expertise once a system has been designed and built. It is not “hey that’s a cool system you have there, let me use my engineering expertise to figure out how to test and automate it.” It is not teaching quality assurance professionals to code, calling them quality engineers, and what they do quality engineering. All of this is great, but none of this defines quality engineering. None of this solves the order of magnitude problem. In order to do quality engineering, you need to intentionally and explicitly use quality assurance expertise to build testable systems.

Do you need a role called quality engineer to do quality engineering? Of course not. Quality engineering is a philosophy and an approach to software development that is completely orthogonal to titles. It simply says: validation is a huge part of software delivery and quality assurance is complex and challenging; you need to deploy those experts not only to validate software, but also to build software that is validatable. Those experts could be called developers, QAs, quality engineers, or anything else that resonates with you. The key is the skillset and the application of that skillset to the correct problem, not the title.

Conclusion

There is nothing in the above article that proves some systems are multiple orders of magnitude more difficult to test than others; no data, no studies, no empirical evidence. I can’t give you that, and I probably wouldn’t trust anyone who gave it to me.

What I can give you is my experience of over two decades helping companies build complex software systems. And overwhelmingly what I have seen is not companies struggling to build unbuildable systems, but struggling to validate those systems once built.

The most common project pathology in large implementation efforts is a system that moves predictably and on-target to kinda-done, but then hits an invisible wall and slows to a crawl. Delivery dates get pushed back, then pushed back again. Frustrated executives start showing up in previously team-only meetings to reiterate deadlines that everyone already knows and demand daily updates on dubious metrics like daily test cases executed. The collective ire of the organization descends on those perceived to be responsible for the inability to finish validation. What’s taking so long? We designed this system, we built this system, why can’t you test it? How hard can it be!

Peek behind the curtains of these teams and you will undoubtedly find software that is hard to test—and herculean, last-ditch, emergency measures to overcome it. You will see people struggling with test data, unable to set up scenarios or permutations, and building complex tools and processes to manage it. You will see teams struggling to build deterministic automation, making huge investments to do so but getting stymied by unstable environments. Given the impracticality of anything else, you will find teams forced to overly rely on UI driven E2E test automation, with suites in the thousands that can only be run overnight, and requiring the entire next day to debug the results. (Of course, only after rerunning failed tests three times.) You will find all this, and probably a lot more.

Unfortunately, many people do see this, and come to the damaging and horribly wrong conclusion that the team has a testing problem.

This situation is not the result of bad testing practices or unqualified testers. It’s the result of building something with the catastrophically naïve assumption that all systems are testable. This is not the case, and these projects are now in the unenviable situation where the effort to get the system fully working is more effort than was necessary to get it to kinda-working. For these types of systems, validating the software will be significantly more difficult than building it in the first place—and might not be possible at all.

If you want testing to be efficient, you need testable systems, which means you must build systems to be testable. You must make it a requirement that the validation approach of the software is considered and incorporated into the architecture, design, and implementation of the software itself. Involve people with experience and expertise in all things software validation (regardless of their title) into these processes. Curate a culture of quality engineering in your organization that seeks to minimize the problem of validation and avoid the order of magnitude problem of untestable software to begin with.

Other Quality Engineering stories by Blake Norrish

--

--

Blake Norrish
Slalom Build

Quality Engineer, Software Developer, Consultant, Pessimist — Currently Sr Director of Quality Engineering at Slalom Build.