Why Clinicians Let Their Computers Make Mistakes

We tend to trust our computers a lot. Perhaps too much, as one hospital nurse learned the hard way.

This is part 3 of The Overdose. Read part 1 and part 2.

Brooke Levitt had been on the nursing staff at UCSF for about 10 months when Pablo Garcia was admitted for his colonoscopy. Levitt is in her mid-twenties, with an open face, a ready smile, and an upbeat Southern California vibe that makes her a favorite of kids and their parents. She couldn’t have been more thrilled to land a job at the renowned academic medical center straight out of nursing school. She was assigned to the pediatric intensive care unit (PICU), and she loved the work because “you’re constantly on your feet, critically thinking, working with a team of physicians and pharmacists, and you’re always at the bedside.” After six months of the standard probationary period, Levitt was now fully credentialed, and she prided herself on knowing the PICU system inside and out.

On July 26, 2013, Levitt was assigned a night shift, not in her usual ICU, but on a unit that was short-staffed, the general pediatrics floor. In the parlance of the hospital, she was a “floater,” and it was only the second time she had floated outside the PICU since starting her job.

The system of floating is governed by a kind of lottery — every nurse, except the most senior, is eligible. “I don’t want to float,” Levitt later told me, “because I don’t know the unit; I don’t know the nurses. Most people don’t like it.” But when your number comes up, you have no choice.

Pablo Garcia was Levitt’s second patient that afternoon. She gave him several of his medications, including multiple cups of the bowel-purging GoLYTELY liquid. Then she came to the order for the 38½ Septras in the computer — a shockingly high dose — and, sure enough, she found all the pills in Pablo’s medication drawer. “I remember going to his drawer and I saw a whole set of rings of medications, which had come over from the robot. And there were about eight packets of it on one ring. And I was like, wow, that’s a lot of Septra. . . . It was an alarming number.”

She’d given Septra before, in the ICU, but always in liquid or intravenous form, never pills. Her first thought was that perhaps the pills came in a different (and more diluted) concentration. That might explain why there were so many.

Since the Paleolithic Era, we humans have concocted explanations for stuff we don’t quite understand: tides, seasons, gravity, death. The idea that the Septra might have been diluted was the first of many rationalizations that Levitt would formulate to explain the unusual dose and to justify her decision to administer it. At first glance it might seem crazy for her to have done so, but the decisions she made that night were entirely consistent with patterns of error seen in medicine and other complex industries.

What is new for medicine is the degree to which very expensive, state-of-the-art technology designed to prevent human mistakes not only helped give rise to
the Septra error, but also failed to stop it, despite functioning
exactly as it was programmed.

The human lapses that occurred after the computerized ordering system and pill-dispensing robots did their jobs perfectly well is a textbook case of English psychologist James Reason’s “Swiss cheese model” of error. Reason’s model holds that all complex organizations harbor many “latent errors,” unsafe conditions that are, in essence, mistakes waiting to happen. They’re like a forest carpeted with dry underbrush, just waiting for a match or a lightning strike.

Still, there are legions of errors every day in complex organizations that don’t lead to major accidents. Why? Reason found that these organizations have built-in protections that block glitches from causing nuclear meltdowns, or plane crashes, or train derailments. Unfortunately, all these protective layers have holes, which he likened to the holes in slices of Swiss cheese.

On most days, errors are caught in time, much as you remember to grab your house keys right before you lock yourself out. Those errors that evade the first layer of protection are caught by the second. Or the third. When a terrible “organizational accident” occurs — say, a space shuttle crash or a September 11–like intelligence breakdown — post hoc analysis virtually always reveals that the root cause was the failure of multiple layers, a grim yet perfect alignment of the holes in the metaphorical slices of Swiss cheese. Reason’s model reminds us that most errors are caused by good, competent people who are trying to do the right thing, and that bolstering the system — shrinking the holes in the Swiss cheese or adding overlapping layers — is generally far more productive than trying to purge the system of human error, an impossibility.

A 1999 report by the Institute of Medicine launched the patient safety movement with the headline-grabbing estimate that nearly 100,000 patients a year in the United States die of medical mistakes — the equivalent of a jumbo jet crashing every day. Tens of thousands of these deaths were from medication errors. For these, computerization was touted as the most promising fix, since it can plug holes such as illegible handwriting, mistaken calculations (for example, adding a zero to a calculated dose creates a tenfold overdose, which can be fatal if the drug is insulin or a narcotic), and failure to check drug allergies before administering a medication. More sophisticated computer systems go even further, building in alerts to guide doctors to the correct medication for a given condition, signaling that our dose is too high or low, or reminding us to check a patient’s renal function before prescribing certain medications that affect the kidneys.

But even as computer systems shrink the holes in certain layers of the metaphorical Swiss cheese, they can also create new holes. As Pablo Garcia’s case illustrates, many of the new holes in the Swiss cheese weren’t caused by the computer doing something wrong, per se. They were caused by the complex, and under-appreciated, challenges that can arise when real humans — busy, stressed humans with all of our cognitive biases — come up against new technologies that alter the work in subtle ways that can create new hazards.

Pablo Garcia’s hospital ward doubled as UCSF’s pediatric research center, where patients on clinical trials frequently receive unusual medications. Brooke Levitt, still a bit baffled by the number of Septra pills, now wondered whether that explained the peculiar dose — perhaps Pablo was on some sort of research protocol. She thought about asking her only colleague on the floor, the charge nurse, but she knew that the charge nurse was busy seeing her own patients and delivering their medications.

Of course, Levitt now beats herself up for not tapping her colleague on the shoulder. But it’s not that surprising that she failed to do so. Studies have found that one important cause of errors is interruptions, so clinicians at UCSF and elsewhere have been counseled to avoid them, particularly when their colleagues are performing critical and exacting tasks like giving children potentially dangerous medications.

In some hospitals, nurses now mix or collect their medications wearing vests
that say “Don’t Interrupt Me,” or stand inside a “Do Not Interrupt” zone
marked off with red tape.

But there was probably something else, something more subtle and cultural, at play. Today, many healthcare organizations study the Toyota Production System, which is widely admired as a model for safe and defect-free manufacturing. One element of the TPS is known as “Stop the Line.” On Toyota’s busy assembly line, it is every frontline worker’s right — responsibility, really — to stop the line if he thinks something may be amiss. The assembly line worker does this by pulling a red rope that runs alongside the entire line.

When a Toyota worker pulls the cord for a missing bolt or a misaligned part, a senior manager scrambles to determine what might be wrong and how to fix it. Whether on the floor of an automobile manufacturing plant or a pediatrics ward, the central question in safety is whether a worker will “stop the line” — not just when she’s sure something is wrong but, more important, when she’s not sure it’s right.

Safe organizations actively nurture a culture in which the answer to that second question is always yes — even for junior employees who are working in unfamiliar surroundings and unsure of their own skills. Seen in this light, Levitt’s decision to talk herself out of her Spidey sense about the Septra dose represents one nurse’s failure in only the narrowest of ways. More disturbing, it points to a failure of organizational culture.

Levitt’s description of her mindset offers evidence of problems in this culture, problems that are far from unique to UCSF. “When I was counting all the pills and seeing them fill half a cup, my first thought was, that’s a lot of pills. Obviously it didn’t alarm me enough to call someone. But it was more than just a nagging sensation.”

Why didn’t she heed it? Another factor was her rush to complete her tasks on an unfamiliar floor. The computer helps create the time pressure: a little pop-up flag on the Epic screen lets nurses know when a medication is more than 30 minutes overdue, an annoying electronic finger poke that might make sense for medications that are ultra-time-sensitive, but not for Septra pills. She also didn’t want to bother the busy charge nurse, and she “didn’t want to sound dumb.”

As is so often the case with medical mistakes, the human inclination to say, “It must be right” can be powerful, especially for someone so low in the organizational hierarchy, for whom a decision to stop the line feels risky.

Finally, the decision to stop the line sometimes hinges on how much effort it takes to resolve one’s uncertainty. Remember that Levitt was usually assigned to the pediatric ICU, where nurses, doctors and pharmacists still generally work side by side, hovering over desperately ill babies. “I’m so used to just asking a resident on the spot, ‘Is this the dose you really want?’” she said. But on the wards, where the pace is slower and the children are not as critically ill, the doctors have all but disappeared. They are now off in their electronic silos, working away on their computers, no longer around to answer a “Hey, is this right?” question, the kind of question that is often all that stands between a patient and a terrible mistake.

But there’s another major reason Levitt didn’t call anyone for help. She trusted something she believed was even more infallible than any of her colleagues: the hospital’s computerized bar-coding system. The system — not unlike the one used in supermarkets and stores everywhere — allows a nurse to scan a medication before she gives it to be sure it’s the right medicine, at the right dose, for the right patient.

In a seminal 1983 article, Lisanne Bainbridge, a psychologist at University College London, described what she called the “irony of automation.” “The more advanced a control system is,” she wrote, “so the more crucial may be the contribution of the human operator.” In a famous 1995 case, the cruise ship Royal Majesty ran aground off the coast of Nantucket Island after a GPS-based navigation system failed due to a frayed electrical connection. The crew members trusted their automated system so much that they ignored a half-dozen visual clues during the more than 30 hours that preceded the ship’s grounding, when the Royal Majesty was 17 miles off course.

In a dramatic study illustrating the hazards of overreliance on automation, Kathleen Mosier, an industrial and organizational psychologist at San Francisco State University, observed experienced commercial pilots in a flight simulator. The pilots were confronted with a warning light that pointed to an engine fire, although several other indicators signified that this warning was exceedingly likely to be a false alarm. All 21 of the pilots who saw the warning decided to shut down the intact engine, a dangerous move. In subsequent interviews, two-thirds of these pilots who saw the engine fire warning described seeing at least one other indicator on their display that confirmed the fire. In fact, there had been no such additional warning. Mosier called this phenomenon “phantom memory.”

Computer engineers and psychologists have worked hard to understand and manage the thorny problem of automation complacency. Even aviation, which has paid so much attention to thoughtful cockpit automation, is rethinking its approach after several high-profile accidents, most notably the crash of Air France 447 off the coast of Brazil in 2009, that reflect problems at the machine–pilot interface. In that tragedy, a failure of the plane’s speed sensors threw off many of the Airbus A330’s automated cockpit systems, and a junior pilot found himself flying a plane that he was, in essence, unfamiliar with. His incorrect response to the plane’s stall — pulling the nose up when he should have pointed it down to regain airspeed — ultimately doomed the 228 people on board. Two major thrusts of aviation’s new approach are to train pilots to fly the plane even when the automation fails, and to prompt them to switch off the autopilot at regular intervals to ensure that they remain engaged and alert.

But the enemies are more than just human skill loss and complacency. It really is a matter of trust: humans have a bias toward trusting the computers, often more than they trust other humans, including themselves.

This bias grows over time as the computers demonstrate their value and their accuracy (in other words, their trustworthiness), as they usually do. Today’s computers, with all their humanlike characteristics such as speech and the ability to answer questions or to anticipate our needs (think about how Google finishes your thoughts while you’re typing in a search query), engender even more trust, sometimes beyond what they deserve.

An increasing focus of human factors engineers and psychologists has been on building machines that are transparent about how trustworthy their results are. In its 2011 defeat of the reigning Jeopardy champions, the I.B.M. computer Watson signaled its degree of certainty with its answers. Before he passed away last month, George Mason University psychologist Raja Parasuraman was working on a type of computer Trust-o-Meter, in which the machine might have a green, yellow or red light, depending on how trustworthy it thinks its result is.

But that might not have bailed out Levitt, since the bar-coding machine probably felt pretty darn sure that it was prompting her to deliver the correct dose: 38½ pills. So we are left struggling with how to train people to trust when they should, but to heed Reagan’s admonition to “trust but verify” when circumstances dictate. The FAA is now pushing airlines to build scenarios into their simulator training that promote the development of “appropriately calibrated trust.” Medicine clearly needs to tackle its version of the same problem.

In Levitt’s case, the decision to put her faith in the bar-coding system was not born of blind trust; since it had been installed a year earlier, the system had saved her, as it had all the nurses at UCSF, many times. Unlike the doctors’ and pharmacists’ prescribing alerts and the ICU cardiac monitors, with their high false positive rates, the nurses usually found their bar-code alerts to be correct and clinically meaningful. In fact, under the old paper-based process, the drug administration phase was often the scariest part of the medication ecosystem, since once the nurse believed he had the right medicine, there were no more barriers standing between him and an error — sometimes a fatal one.

Months after the error, I asked Levitt what she thought of Epic’s bar-coding system. “I thought it was very efficient and safer,” she said. “If you scan the wrong medication, it would instantly have this alert that said, ‘This is the wrong medication; there’s not an admissible order for this medication.’ So I would know, oops, I scanned the wrong one. It saved me.”

Levitt trusted not just the bar-coding system, but UCSF’s entire system of medication safety. Such trust can itself be another hole in the Swiss cheese. While a safety system might look robust from the outside — with many independent checks — many errors pick up a perverse kind of momentum as they breach successive layers of protection. That is, toward the end of a complex process, people assume that, for a puzzling order to have gotten this far, it must have been okayed by the people and systems upstream. “I know that a doctor writes the prescription,” Levitt said. “The pharmacist always checks it... then it comes to me. And so I thought, it’s supposed to be like a triple-check system where I’m the last check. I trusted the other two checks.”

Levitt took the rings laden with medications to Pablo’s bedside. She scanned the first packet (each packet contained one tablet), and the bar-code machine indicated that this was only a fraction of the correct dose — the scanner was programmed to look for 38½ pills, not one. So she scanned each of the pills, one by one, like a supermarket checkout clerk processing more than three dozen identical grocery items.

Yet even after the bar-code system signaled its final approval, Levitt’s nagging sense that something might be wrong had not completely vanished. She turned to her young patient to ask him what he thought.

Pablo was accustomed to taking unusual medications, so he said that the Septra dose seemed okay. She handed the pills to her patient and he began to swallow them.

About six hours later, the teenager blacked out, his arms and legs began jerking, and he stopped breathing.

Click here to read Part 4 of The Overdose

This is excerpted from The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age, by Robert Wachter. McGraw-Hill, 2015. You can buy the book here.

Illustrated by Lisk Feng
Follow Backchannel: Twitter | Facebook

Show your support

Clapping shows how much you appreciated Bob Wachter’s story.