Despite decades of dreaming and tinkering, machines that can test our software for us remain science fiction.
I wasn’t quite sure what my new boss meant but I was almost sure that I disagreed with what he said. It was glib. It was facile. It ignored nearly a century of quality engineering plus recent advances in capability maturity.
Jerry McDonald, VP of More Things than Could Fit on a Business Card, became my boss after a reorganization left me staring up through the hole where we once had a software test manager. I mention Jerry’s real name because 1) it lends a certain irony to his advice and 2) he deserves public recognition for telling me something near the beginning of my career that, as it turned out, I was too convinced of my own wisdom to learn from any teacher outside the school of hard knocks. A large and growing generation of software development managers still has not learned it.
The advice? “You can’t manage software testing like a McDonald’s.”
If I hold any criticism against McDonald’s (the restaurant chain, not Jerry’s shop), you won’t find it here. They have exceptionally good reasons for doing what they do. Masterfully applying core “Big Q” Quality techniques like Statistical Process Control, they defined fast-food production precisely and spent decades squeezing out every last drop of variation. No variation means a predictable product on a predictable schedule with no waste. Customers loved it. Other companies tried to emulate it. It was only a matter of time before someone transplanted the model into software product development.
The time came just as I was entering the field. A couple of years before my encounter with Jerry, I showed up for my new “software quality assurance engineer” job and found two objects on my desk that defined the company’s expectations for me. The first was a copy of Watts Humphrey’s Managing the Software Process. The second was a big box containing installation media and instruction manuals for a software automation suite called “SQA Robot” (later bought by IBM, rebranded “Rational Robot,” and then rationally abandoned). The book, a close forerunner of the Capability Maturity Model, made it clear that my job was to reduce software testing to a repeatable process which could then be defined and optimized. SQA Robot stood by, waiting for me to feed it tests in the form of executable code — and why not? If testing can be reduced to a repeatable process, then a robot should be able to repeat it. Our shop was a McDonald’s and SQA Robot was the burger belt.
This was the late 1990s. The Internet had just entered the mainstream. Only a few years earlier, getting our hands on a particular technical book required a whole day of visiting random brick-and-mortar stores or waiting months for the interlibrary loan gears to turn. Now we could click a mouse, provide a credit card, and expect exactly what we wanted in the mail practically overnight. There were even rumors that some people were able to order pizza online for same-day delivery. Everything seemed possible — even robots that tested our software for us.
Under its shiny hood, SQA Robot was ornery and temperamental but that only appealed to my cowboy nature and motivated me to break it like an unruly mustang. After I got it somewhat under control and exercising our software product nightly, I discovered that I could teach it new tricks by writing my own libraries in C++ and bolting them on. As one of our developers had quipped in a slightly different context, I had the source code and therefore could do anything. I was oh, so proud of myself.
With our testing mostly automated or semi-automated, we had the repeatable process required by Watts Humprey. I could tell you exactly how long testing would take, so the only variability in the release schedule was how many bugs we would find and need to fix. I looked forward to expansion, refinement, and continuous optimization. Who knew? Maybe we could even go for CMM Level 5 certification.
If this story were fiction, here is the point where I would tell you about the failure which I spent the previous two paragraphs setting up. Reality wasn’t so tidy. As the dot-com bubble threatened to burst, the company went through multiple rebranding and reorganization spasms. For reasons that continue to elude me, the only casualty in the software test department was our manager. Aside from his advice that I was on the wrong track, my new boss the Vice President mostly left me alone until my retention agreement ran out and I rode off into the sunset still proud of my automation success and convinced of my business process wisdom.
Robbed of the failure I needed and so richly deserved, I allowed myself to be carried along with industry trends, now as a front-line software developer. The bubble burst and the herd was culled. Working in smaller shops (mostly without professional testers), each of us learned to play multiple roles. Programming languages and tools better suited for the kind of work we actually did made coding errors more difficult to introduce and easier to detect. A critical mass of us started to treat unit testing as a core discipline and built it into design, coding, and integration. These trends improved product quality and decreased the perceived need for professional software testers. If our code is already protected by the language and covered by unit tests, why should we hire more humans to do the testing work that had already been done?
Ponder that question in a quiet moment and the outline of Jerry McDonald’s wisdom gradually appears. Allow the thought to persist and you will soon see it everywhere. Just last week, an online medical history survey asked me whether I had any known allergies and allowed me to select responses from a list of conditions like hepatitis. I have no reason to doubt that the developers of this software product used only the safest languages and tools or that they had 100% unit test coverage. Quite the contrary, I have trouble conceiving of how such protections, excellent as they are, could have detected this obvious defect.
You can’t test software like a McDonald’s because, unlike a burger belt, variation is intrinsic to software product development. The whole point of building software is to do something different. If you want a copy of what already exists, you can easily make one with byte-for-byte accuracy. Of course, there is value in detecting unintentional side effects as we make intentional changes. Such detection (“regression testing”) is an important component of any software testing program and a good candidate for systematization or automation but let’s not confuse the simplest and best-defined part with the complicated and messy whole.
In a 2018 EconTalk interview with Russ Roberts, Rodney Brooks of MIT Robotics fame noted that people unfamiliar with the machine learning problem space frequently overestimate technology because we mistake performance for competence. For example, we may observe an image recognition machine correctly identifying a cat in a photograph and then jump to the conclusion that the machine has a concept of “cat” and can do things like diagnosing a hairball or predicting whether Fluffy would be capable of paddling a canoe. According to Brooks, such “general intelligence” machines may arrive some day but are likely decades or centuries away. Recent advances in performance, largely driven by increases in computing power, have only aggravated our natural tendency to mistake it for competence. This mistake leads us to overestimate the current technology and then underestimate the difficulty of advancing to the next level.
Today, we can build machines that perform specialized tasks like informing us quickly when an algorithm produces an unexpected result for some range of inputs. Combined with a thoughtfully designed software development process that positions the machine to deliver this information into the right hands at the right time, we can reduce our product delivery cycle time dramatically by avoiding order-of-magnitude penalties from stale information. That’s a huge benefit but let’s not mistake performance for competence. We have only leveraged realistic technology to solve a narrow problem. We have not built a machine that knows how to test software.
What would a competent testing machine do that realistic technology cannot? Let’s focus on just one aspect of software testing. The primary function of most automation today is to tell us whether a result is unexpected. We program the machine to check for an unexpected result because the comparison is straightforward to perform, but it’s just a cheap proxy for what we really want to know: does this result threaten the value of the product? That more meaningful question lies far beyond the grasp of present-day technology. Maybe the requirement was miscommunicated. Maybe it was communicated perfectly but turned out to be an awful idea that could offend key stakeholders or land the company in legal trouble.
I doubt it was their intent but Managing the Software Process and Rational Robot set me up to hear Jerry’s contrary voice in the back of my head and reach the conclusion that repeatable checks can be valuable when deployed wisely but executing routines does not come close to a complete software testing regimen. Now that I’m the one wearing the “test manager” hat, I’ve applied the lessons by complimenting our substantial automation program with professional software testers. Freed from the routine checking that machines perform for them, our testers unleash the power of their human minds. They behave like investigative journalists and scientists. They ask the important questions and design experiments to discover answers as they interact with the product. They dig beneath explicit requirements to learn who wants what and why and become experts on the complicated and messy path from human needs to executable code. They sort through its ambiguities, contradictions, miscommunication, and office politics so they can predict who will be bugged by what, and by how much. No machine this side of Planet Vorlon can deliver that kind of information — and it’s the kind that tells us the most about our product risk.
If you are one of the many software development managers who have stopped hiring “manual testers” or encouraged your existing staff to pick up automation instead, I hope that you will reconsider. While I agree that our future will be faster and more mechanized and am entirely on board with our industry’s movement toward continuous delivery along automated pipelines, let’s not mistake our realistic technology for testing machines. At present, there is no such thing as a competent testing machine. We still need human testers because our most significant bugs originate from the complicated human mess upstream from the code. They cannot be reduced to a simple mismatch that we can expect a realistic machine to detect. Today, only humans will find our most important bugs. As an organization that develops and releases software products, the only variables we can control are which humans will find our bugs and under what circumstances.
Including human testers does not entail abandoning the continuous delivery dream or throttling our pipeline to match a human pace. The old paradigm where human testers are inspectors holding clipboards at the end of an assembly line is a relic from the past. Let’s bury it and instead find creative ways to leverage the skill and expertise of human testers at every phase of product development. Let’s shift testing all the way leftward into discovery and ideation. Then, let’s shift it all the way rightward into the live Production systems that our customers actually experience. We don’t ship our software on floppy disks anymore. It no longer takes us months to react to defects, so let’s take some risks and depend on our professional testers to help us choose cheap fixes over expensive prevention. This is what continuous delivery looks like for non-critical systems. Let’s embrace it.
Jerry was right. We can’t test our software like a McDonald’s but only because the model represents an unrealistic extreme that attempts to reduce an intensely human endeavor to a repeatable process. By the same token, we should not jump to the opposite extreme and reject systematization and mechanization entirely. Like testing, management can’t be reduced to a flowchart with deterministic branches and well-defined decision thresholds. All realistic options are available to us, be they machines, professional testers, bounty hunters, interns, customers, or bodies off the street. Unlike a McDonald’s franchise, our shop did not come with a blueprint for success. Where would be the fun in that?
As a child in the early 1980s, Mike Duskis discovered that hacking video games was more fun than playing them. Then he hacked his way into a software development career spanning industries from children’s entertainment to safety-critical medical devices. He is currently Test Manager at CyberGRX in Denver, Colorado, USA.