Waiter, There’s a Database in My Unit Test!
Distinctions between unit, integration, and system tests drive silly interview questions and serious design decisions
How many kinds of testing are there? What is the difference between a fake and a mock? When indenting code, should you use spaces or tabs? What are the three characteristics that distinguish an integration test from a system test?
Ask a veteran software developer about these interview questions and you might get an eye roll, a rant, or even a meditation. If you’re a new developer interviewing for your first job and you hear a question like one of these, I have some veteran advice for you: resist the temptation to search your memory for the matching textbook page. Instead, identify the nearest exit and make a mad dash. Feigning a sudden onset of salmonella poisoning is optional but worth some style points for a performance that fits the occasion. Any boss who believes that the question has a sensible answer would also make your life miserable with incessant demands based on arbitrary rules that may or may not be appropriate for any given case.
Search the archives of Charles Babbage all you’d like. You’ll find no Law of Coding which proclaims that all programmers everywhere shall indent with spaces. On the other hand, it would be wrong to conclude that tabs and spaces are equivalent and that the choice between them makes no difference. The interview questions are silly only because they lack context. Add context and the words take on sharper meaning with important consequences.
Real Life Isn’t a Quiz Show, but Technical Terms Still Matter
In “The Dog Ate My Unit Tests,” I confessed that my transformation from haphazard hacker to disciplined developer began when my boss caught me embarrassed and confused about basic terminology. As part of my penance, I joined a study group that met every week at the office of a pavement company on the edge of civilization to plow through ASQ’s “Certified Software Quality Engineer Body of Knowledge” (CSQE BOK). A decade later, the massive three-ring binder, overflowing with paper from the Quality Council of Indiana, still gathers dust on my shelf. It’s full of technical terms that everybody uses but with a smorgasbord of definitions. You’ll find variety even inside the binder. BOK III.A contains definitions of unit, integration, and system testing from the “V Model.” BOK VI.B.4 contains an amalgamated testing taxonomy that defines the same terms differently. There is overlap and inconsistency throughout. I wouldn’t recommend this material over most forms of entertainment, but it’s worth a chuckle or two if you’re stuck in the remote office of a pavement company late at night.
Little has changed. Martin Fowler pointed out last year that we still can’t agree on a precise definition of integration test but took a shot at explaining the concept anyway because it’s important. The borders with unit test on one side and system test on the other may be fuzzy, but they create significant consequences for how we design and structure our software. Those decisions, in turn, create significant consequences for the success of our projects and the well-being of our colleagues and other stakeholders.
I recently had a chat over lunch with a developer who was cursing his fate for inheriting a component that fused business logic and data persistence so tightly that teasing them apart would require the finesse of a brain surgeon. Sadly for him, the surgery theater awaited because his next task was to replace the data source. Sadder still, the operation would also require the luck of a two-time lottery winner because he had nothing to warn him when his changes violated some established (but probably undocumented) expectation.
Of course, the component was covered by a testing program — but one that had an impact only after the component was integrated into the entire system and tested as a whole. This level of testing did my friend no good. He needed instant feedback so he could make changes with confidence. He only had testing that took too long and was performed at such a high level that it could easily miss the low-level failures that interested him.
Using more technical terminology, he needed unit tests but had only system tests. We’ve been unable to agree about the borders in the middle, but we can agree about the extremes. Even if we stick to the extremes, we can drive useful discussions about the level at which we should test. We can also trace tales of woe to testing at the wrong level.
For purposes of this article, I’m going to tighten the definitions somewhat so we can discuss the spectrum. You don’t need to draw the lines the same places I do, but I encourage you to draw them somewhere and to have good reasons for your choices. The lines will form a mental model that will help you think clearly about where you’re testing, when, and how.
As you’ll see when we get around to the subject of unit tests, drawing the lines in a particular way can have tremendous impact on design decisions. A disciplined approach to unit testing could have avoided my friend’s predicament entirely by forcing the business logic apart from the data persistence very early in the design process.
Regardless of the Level, We Test to Discover and Evaluate
One of my favorite scenes from ancient literature appears in Plato’s dialogue “Meno.” Feigning complete ignorance, as usual, Socrates torments poor Meno with demands for a coherent definition of the word virtue. Meno offers a list of virtues, some applicable to women, some to men, some to rulers, some to slaves. Socrates replies with the zinger
How fortunate I am, Meno! When I ask you for one virtue, you present me with a swarm of them.
If we talk about unit tests, integration tests, and system tests but can’t explain the core concept that makes all of them tests, we reveal, as Meno did, that we haven’t thought the definition all the way through and probably can’t explain what we are doing.
Many people assume that testing software is like performing a twelve-point safety inspection on a car. Does the brake light turn on when I touch the brake pedal? Check. Does the battery provide at least twelve volts? Check. Is the engine oil within the marked range on the dipstick? Check.
Of course, we perform well-defined checks as part of our testing, but like Meno’s list, they don’t define the core. Someone had to design the checklist and (we hope) had a reason for including each item and specifying each measurement technique. Why check the battery but not the fuel injectors? Why trust the dipstick instead of draining the oil and weighing it? Follow the checks backward to their design and you will find a human mind tasked with learning about important system characteristics while navigating real-world constraints.
James Bach and Michael Bolton have spent many years refining some of the clearest thinking on the subject of testing. They capture its core concept this way:
Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.
Testing is so deeply entrenched in the human experience that most of us don’t even realize we’re doing it most of the time. If you consider taking a different route to work, signing up for a news feed, or switching brands of facial tissue, what questions do you ask? How do you go about answering them? How much time and money are you willing to spend on gathering information before you make a decision?
As you learn, you may develop rules that help you to determine whether a particular option is worth considering. If you’re so inclined, you might even write a program that uses the rules to perform an evaluation or complete a checklist for you, but the process of discovery, designing appropriate rules, and re-evaluating based on new information is all yours as a thinking human being.
Bach and Bolton refer to evaluations that you can reduce to an algorithm as “checking.” Checking is part of testing. It’s often the most visible part and therefore easy to confuse with the whole. This raises the danger that the process of testing will become neglected and then forgotten. It leads to the mistaken idea that testing can be automated, an idea that has become embedded in our language and tooling.
I’ll be forthright about this. I’m following through on a promise I made to Michael Bolton last week about upholding the distinction. I didn’t raise my right hand, and he didn’t give me a badge, but I do feel deputized. Consequently, you’ll read me deviating from more common terminology for the sake of being more precise: automated check instead of automated test, for example. If you find this distracting, I apologize but only half-way. Maybe the distraction will make you pause and consider the difference. Maybe you’ll become a deputy too.
Unit Testing Keeps the Code Fit and Flexible
A few months ago on an ad-hoc panel about API testing at the Software Quality Association of Denver (SQuAD), someone asked Todd Bradley and me an unrealistic but thought-provoking question. If we were forced to abandon two of the three testing layers (unit, integration, system), which one would we keep? Despite the audience of professional software testers, neither of us hesitated to give the answer that would put all of them (and me too) out of work: all the layers are important, but there is only one that we would be paralyzed without. Without unit testing, we would have no unit checks. Without unit checks guarding our backs, we would be afraid to fix the myriad of bugs that professional testers were practically guaranteed to find at the system level.
Automated unit checks are not an optional safety feature that you can tack on to your product after it has been baked. Instead, they are the proteins that keep your dough from crumbling as you knead it. They enable you to make changes boldly without fearing a mess. Unit checks document your learning and protect the design that emerges from the process of unit testing.
What is a unit?
Here is one definition from the CSQE BOK (VII.B.4):
A unit is the smallest testable piece of software that can be compiled, assembled, linked, or put under the control of a test harness or driver. A unit is usually the work of one programmer and may consist of several hundred, or fewer, lines of source code.
My eyes bulge out at “several hundred… lines of code,” but the definition does say “or fewer.” That’s an upper limit and (I hope) not meant to be taken precisely. The key concept isn’t the line count. It’s the idea of a small black box that serves a narrow and well-defined purpose. A cookie jar is a reasonably scoped unit. A web browser is not.
The intimate relationship between unit testing and design
All code that does something purposeful has a design. The design may be sloppy, haphazard, or worse but it’s still a design. Design comes from questioning, studying, modeling, observation, inference, etc. That list of activities should look familiar. I pulled it straight from the definition of testing offered by Bach and Bolton. I’m far from the first to have noticed the relationship between unit testing and code design. Expert coders like Allen Holub and Mohamed Taman wax poetic on the subject and advocate using testing to drive explicit checking and design. Other coders may use implicit testing to drive implicit design, but all design is based on some kind of testing.
It seems to happen less frequently these days, but I still find people arguing over whether unit checking should be the responsibility of professional testers or the developer who wrote the unit. If you’re asking this question, you don’t see (or perhaps choose to ignore) the interplay between testing, design, and checking.
I do not intend to draw a bright yellow police boundary around units and say that professional testers aren’t permitted to cross over and test them. I see no reason to stop them if they are following a line of inquiry or simply curious, but formal unit checking flows from the same process that produces the design. In many cases, the checks are the design.
Test design influences product design
Remember the developer who learned that his next task was to tease business logic apart from data persistence and responded by cursing his fate, the day he was born, and the whole host of heaven? He should have cursed the way his shop defined unit instead.
If we take “smallest piece of testable software” seriously and actually make our small pieces of software testable, we wouldn’t dream of writing a “unit” check that provisions a database for the purpose of verifying business logic. Business logic would be in one testable unit all by itself, covered by unit checks that know nothing about data persistence. Data persistence logic would be in a separate unit covered by unit checks that know nothing about business logic. If we’re concerned about how the two units interact, we can write a separate set of integration checks.
A second-year computer science student would immediately recognize this design as nothing more than adherence to separation of concerns. In the mess we call real life, basic principles get overlooked or ignored every day. Sometimes nothing bad happens. Sometimes we see bizarre intermittent failures in the field. Sometimes a developer shakes their fist at the sky and then reports to their project manager that they’re about to lose a week.
Get disciplined about building executable unit checks from the very beginning and you will force your design to follow good principles. There is no such thing as a unit check that mixes business and data logic, or a unit check for a unit that can’t be tested.
The value of unit testing
Because unit testing is so tied up with design, it’s going to happen whether or not you call it out as an activity. The question, then, isn’t whether you perform unit testing but whether you do so explicitly, treat it like an intentional discipline, and produce executable unit checks. This brings us back to the answer that Todd and I gave at SQuAD. If you’re working for a mad bureaucrat who forces you to test at only one layer, then you should embrace discipline, build your executable unit checks, and cling to them for dear life. Even if you’re not working for a mad bureaucrat, embracing discipline can only help you.
A good unit check reliably signals failure when the unit fails to meet an expectation. It’s a humble job with tremendous implications for the maintainability of your codebase. If you’re confident that a unit check will protect you when you accidentally break an interface on which other units depend, then you are free to refactor, optimize, or extend at will. Conversely, if you lack such confidence, then any change creates a risk that some unidentified dependent unit will fail. Your only mitigation is to test the integrated system and pray that your coverage is adequate.
Characteristics of a good unit check
- It is 100% automated, requiring no human intervention or interpretation.
- It runs very quickly, completing in a few milliseconds.
- It isolates the unit from the rest of the world, replacing dependencies with test doubles.
- It makes no assumptions about state (configuration, presence or absence of data, etc).
- It proves exactly one point.
This is just a rough summary of unit testing to serve as a contrast with system and integration testing. The subject of unit testing, even while mostly reduced to writing checks, easily fills entire large books. It may be the most important discipline a programmer can learn.
System Testing Detects Emergent Misbehavior
What is a system?
If a unit is the smallest piece of a software product that can be tested, the system is the largest. CSQE BOK (VII.B.4) defines system testing this way:
A system is a big component. System testing is aimed at revealing bugs that cannot be attributed to components, as such, but to the inconsistencies between components, or to the planned interactions of components, and other objects.
The system is a big component. How big? I’m familiar with one medical device company, which designs its products in a way that specifically includes a human operator in mitigation strategies for various adverse events with critical safety impacts. In this case, the system consists of hardware, software, and at least two human beings: operator and patient. A test that excludes any of these elements runs the risk of missing emergent failure modes like this one: during a long procedure, the patient’s wife arrives with a large bag of cookies, which he consumes in its entirety, causing his plasma lipids to surge and the software to “detect” a sensor malfunction (true story — watch what you eat).
Complexity and emergence
The key concept that distinguishes systems from units is “emergence.” Units are sufficiently small and well defined that we can map their behavior comprehensively. Given preconditions x when condition y, we expect behavior z. Systems, by contrast, are subject to complexity. They behave in ways that are impossible to predict, even with complete knowledge about their units. We can’t control all relevant preconditions, nor can we fully describe the conditions. Both grow exponentially as interactions with the system grow linearly, and we lack confidence that we have the means to discover all of them, even with infinite time.
To take a simple example, build a program that picks two random words from a static list (even one as big as the Oxford English Dictionary). The number of possible outputs is massive but simple. You will always see a pair of words. Extend the program to run the second word through a web search engine, visit the first page in the results, and output the first word on the page. Now you have complexity, not merely size. The behavior of your program will depend on factors that you don’t understand, let alone control. Those factors will interact with one another to produce a mind-boggling range of possible behavior. Expect to encounter combinatorial effects from upstream states, formats, and latency.
The challenge of system testing
As is the case with unit testing, an entire discipline surrounds system testing, a great deal of it dedicated to managing scope to provide reasonable confidence within a reasonable budget. Unlike unit testing, system testing is never “finished.” Unit testing can produce a set of checks that cover all relevant preconditions and possible conditions. System testing can’t even define the full range of what is relevant or possible with any kind of precision. Systems produce surprises from an effectively infinite variety of causes. Testing them is a game of probabilities and risk management.
Unit testing provides a straightforward path to automated unit checks. Like everything else that concerns systems, checking requires a more nuanced approach. Checking becomes more effective with increased stability. Some aspects of the system will be more stable than others. A check on the sum of two numbers breaks down when one of those numbers turns out to be an error message or an image of a gorilla some of the time. Human checkers tend to be more effective than mechanical ones when actual behavior breaks through the boundary of the unexpected and into the bizarre. It’s a good idea to have human testers close at hand to ask important questions and use the surprising behavior to learn more about the system.
In some cases, the most effective approach to system checking will be semi-automated with machines providing output for a human tester to inspect. Fully automated system checks can provide tremendous value as fast feedback mechanisms, but they bring their own complications. Designing checks that produce more signal than noise requires technical skill and effort that tend to surprise even project managers who have been surprised before.
Characteristics of a good system check
- It is performed against the fully built actual system or a close proxy.
- It weighs all other considerations against total execution time (which can get very expensive) and finds ways to economize.
- It provides sufficient logging and other evidence for a developer to understand failures without needing to repeat the check.
That last point is particularly important. Due to the complex nature of complete systems, checks are not repeatable. We may perform exactly the same set of actions, but we can’t guarantee that the underlying conditions are the same. Systems are ripe for intermittent failures and we human beings are notoriously poor at characterizing low-frequency events. We have a much better chance of understanding them if we catch them in the act or monitor their manifestation over time.
Integration Testing Provides Early Warnings About Broken Relationships
What is integration?
Back to our friend, CSQE BOK (VII.B.4):
Integration is a process by which hardware and software components are aggregated to create larger components. Integration testing is testing done to show, even though the components were individually satisfactorily as demonstrated by successful passage of component tests, the combination of components are incorrect or inconsistent.
How does integration testing differ from system testing? Which components are we talking about? Maybe this is too much information about the continuous circus in my mind, but I read about aggregating components from BOK, think of the sound a chicken makes, and immediately replay the old “parts is parts” schtick from Wendy’s.
A potential sweet spot for checking
Don’t blame the CSQE BOK. The truth is that nobody can draw the precise boundary between an integration test and a system test. That’s perfectly fine. We don’t need to. On one end of the spectrum, a good unit check isolates its unit from the rest of the world by faking all of the dependencies. On the other, a good system check covers the entire fully built system. An integration check sits somewhere in between. It covers some carefully defined set of components with everything else absent or faked.
The boundary gets tricky to define because, ultimately, “the system” includes the entire universe and raises deep philosophical implications (butterfly wings and all that), which we don’t need to contemplate for our testing program. For our purposes, it’s good enough to say that we should design integration checks when we would like to evaluate how two or more components behave together, isolated from the remainder of the system. Narrowing the focus in this way affords more control or faster feedback than a system test can provide.
For example, suppose your product contains an analytic engine that performs some unfathomable calculations in real time based on subway passenger counts and clearing prices from the wheat futures market, both of which it receives from in-house components that monitor the appropriate data feeds. A unit check would cover just the analytics engine with the data components faked. A system check would cover the entire system, including the live data feeds. An integration check might include the analytics engine and both in-house monitoring components but fake the data feeds. This enables you to focus on the relationships between the in-house components while eliminating non-determinism from the live feeds. It also allows you to introduce fault injection and learn how the integrated components will behave when the feeds do unexpected things.
Characteristics of a good integration check
- It verifies that two or more components interact as expected.
- It is faster, cheaper, or more effective than checking at the system level.
- It provides sufficient logging and other evidence for a developer to understand failures without needing to repeat the check.
You may have noticed that the last bullet point is repeated verbatim from the characteristics of a good system check. As we expand beyond the unit level and perform checks on an increasing number of integrated components, the potential for nondeterministic behavior grows exponentially and with it the probability that a check will fail intermittently. If you can’t replicate a failure reliably, the next best thing is a precise record of what happened, where, and when.
A place for everything and everything in its place
Testing is a human process of evaluating a product by learning about it. It may include checking, a well-defined evaluation that could be reduced to an algorithm and executed by a machine.
A good software testing program will include checks placed strategically at various levels of the technology stack. They may involve a mix of mechanical and human checkers.
Unit checks ensure that your smallest components continue to behave as specified despite changes to their implementation. They are so easy to run and blazingly fast that you don’t think twice about executing all of them from your development workstation every time you make even the smallest change to your code. They require so little configuration or dependency management (ideally none) that you don’t think twice about running them from your continuous integration pipeline.
Integration checks target the interactions between specific subsets of components where you have reason to be concerned that one component may change in a way that does not violate an explicit contract (which should be covered by a unit test) but nonetheless could cause another component to fail or otherwise behave strangely. In an ideal world, integration checks would be fully automated and run just as quickly and simply as unit tests. In the real world, you may need to compromise and introduce actual dependencies like databases or message queues. These dependencies will carry complications and delays which you would never tolerate in a unit check. As a consequence, you might choose to run them in parallel with your main CI pipeline instead of accepting the longer execution time.
System checks inform you about the unpredictable behavior that can emerge from your complex system. They are expensive, slow, and necessarily incomplete, but you perform them anyway because you are humble. Despite having good unit and integration checks, you recognize how much you don’t know and would like to get ahead of your high-level product risks before a customer steps in one. Automation at the system level poses significant challenges that do not exist at the lower levels and may or may not be worth the effort.
You can use these distinctions to invent silly interview questions, or you can use them to guide your thinking about what, where, and how much. If you’re constantly being surprised by spooky behavior, look at your system and integration checking. If you’re afraid to make changes because you might break something, look at your unit checking. If you review your unit checks and discover a database in the setup, you have reason to suspect the unit. Maybe it has fused two kinds of logic. Better revisit that design before you catch an earful over lunch from the poor developer who inherits it from you.