Fail-stopped is not fail-safe: autonomous systems and trolley problems

7 min readNov 10, 2017

Yesterday, the first self-driving passenger shuttle on public roads began operations, and within the first hour it was involved in an accident. Of course, this type of incident has “political implications” (in the strict sense that choices about what to allow on public roads are naturally subject to public governance). In spite of my critical tone here, I don’t want to be misunderstood as making or supporting claims about public regulation of self-driving technology. (To be fully explicit, my personal beliefs are that the leading self-driving technologies are probably already safer than typical human drivers, and that widespread deployment of such technologies will eventually save hundreds of lives per day which—in today’s world—are lost in car accidents.)

With all that said… in yesterday’s incident, the self-driving vehicle detected an impending collision with a tractor-trailer and, as designed, came to a halt; the collision occurred immediately thereafter.

The fail-stopped assumption

This story illustrates an issue I have long held with the standard approach to self-driving safety: the assumption that, in an adverse or unexpected situation, if the system can automatically transition into a stable, motionless position without causing damage, then any potential accident has successfully been avoided. This could be more concisely stated “Fail-safe means fail-stopped.” There are two kinds of reasons which I imagine support this assumption:

The way our justice system assigns responsibility in automobile accidents (roughly) follows the rule that a stopped vehicle is not “at fault.” So if your vehicle can transition into a stopped position before anything unsafe happens, then hey, it’s not your “fault” if something unsafe happens immediately following. Aside from the benefits of avoiding a traffic citation and the accompanying civil liability, it may also be reasonable to expect that public opinion of a system’s “safety” will follow this legal notion of “fault” (at least to some extent).
In many safety-critical autonomous systems (perhaps a majority), the fail-stopped assumption holds true. For example, consider a 2-ton industrial robot arm: if it can safely stop, as long as it is in a stable and stationary configuration, it’s not going to hurt anyone. Or consider a nuclear power plant: under unexpected conditions, if the plant’s control system can automatically bring it into a stable shut-down state, then any potential nuclear accident has been avoided.

An “emergency stop” button, as commonly used in industrial control systems, demonstrates the implicit assumption that the correct action in any emergency is simply to stop. Image credit

However, note the absurdity of applying this principle in other, non-industrial but safety-critical contexts. For example, if under adverse conditions (such as a power failure), a medical ventilator swiftly transitions into a stable, halted configuration…that is exactly the opposite of the safety property we would want from such a device.

I think the difference in our intuitions about whether fail-stopped constitutes “safety” for different kinds of systems reduces to a question about whether the danger (from which we desire safety) comes from inside the system or from outside it. In the case of a ventilator, the danger comes from the combination of a patient’s diseased tissues and the ambient oxygen concentration and atmospheric pressure—all of which are outside the machine. The machine, then, has an active duty to control the oxygen concentration and pressure in the patient’s airways: if it abandons this duty, danger shall prevail. In contrast, the danger around a 2-ton robot arm comes from the robot arm itself. The environment, a factory floor, is “default safe”: if the machines cease to move, it’s just a big ol’ room. Similarly, the danger around a nuclear power plant comes from the nuclear reactor itself, etc.

In a self-driving car, danger comes from both the car (which can exert large destructive forces) and the ambient environment (because public roads typically contain other vehicles, which can also exert very large forces). Unlike sitting near a halted robot arm on a factory floor, sitting near (or in) a halted car on the middle of a public road is not “default safe”. I’d also point out that the distinction seems to be unrelated to whether or not people sit inside the machine: autonomous vehicles on a closed course (such as the underground or aboveground “people movers” that commonly operate between airport terminals) and roller-coasters seem to essentially satisfy the criterion that “a stable motionless position is safe.” Occasionally a roller-coaster emergency-halts while upside-down, which causes the riders to be extraordinarily uncomfortable, but it seems fair to say that they are not in danger, as long as the harness system is properly engineered. (To fit the latter caveat into this framework, one could say that the configuration of the roller-coaster is not “stable” if riders may fall from their seats, i.e. with gravity unopposed by contact forces from harnesses/restraints.)

Trolley problems

Developers of self-driving technology are often asked about so-called “trolley problems,” ethical questions about situations in which an accident is inevitable but some choice is available regarding who gets harmed.

In the original, highly contrived (and gruesome) “trolley problem”, you must choose whether to divert the runaway tram, killing one person to save five others. Image credit

The response is invariably some polite form of “that’s a ridiculous question, which we have never seriously considered.” It seems very likely that, in fact, none of the leading self-driving systems have anything like a “scoring system,” or loss function, to judge the social-welfare implications of a given collision scenario. If a collision seems unavoidable, the algorithm’s goal is simply to come to a halt before the collision happens. At that point, well, if a stopped car isn’t safe, that is someone else’s fault.

A particular, less fanciful version of the “trolley problem” question is sometimes asked: does a self-driving car prioritize the safety of its owners/occupants over the safety of others, or vice versa? The response is the same, because the “safety” of individual humans is abstracted away from the problem statement of the self-driving safety engineers, hidden behind the fail-stopped assumption. But the effect of this assumption is a tacit preference for the car to get itself impacted rather than to impact something else. Whether this results in more situations where the occupants of the car get injured than others is very hard to say, since obviously a collision always causes damage to both parties.

But this passive approach to safety is certainly a different one than that of the typical human (who will actively avoid danger to themselves), and unlike other such differences (lightning-fast reaction time, perceptual acuity, lack of distraction or impairment by socially consumed substances), it’s not an obviously good one. It’s also not the approach that popular culture has prepared us to expect from a “robot”, which would be to take whatever actions minimize harm, coldly calculated according to a set of moral values (either its owner’s, its creator’s, the government’s, its own, or that of robot-kind). One could argue that Waymo’s or the NHTSA’s “moral values” dictate that being crashed into is a better social outcome than crashing into something, but I doubt any of their spokespeople would agree. Instead, they’d probably gesture at the concept of “fault.” In sum, the implicit ethics of fail-stopped safety is deontological rather than consequentialist.

In fact, the standard deontological responses to trolley problems in philosophy are in alignment with the fail-stopped approach to safety: in a no-win scenario, both mandate a retreat to “no action”—after all, they say, if you’re not exerting any forces at all, you can’t be doing anything unethical. In response, consequentialists say that the idea of a privileged “don’t do anything” action is an illusion.

What to do?

Making claims about which is the true ethical framework is well beyond the scope of this writing. But I hope to convince people, perhaps especially those involved in the development of self-driving technology, that the fail-stopped assumption is highly suspect on public roads. It may seem like a solid engineering principle that obviates the need for ethical considerations, but in a context where danger can come from both inside and outside a system, it actually results in an implicit fundamental choice of ethical framework. And in situations like yesterday’s—where there is a clear and present choice between almost certainly being collided with while stopped, or maybe being collided with while taking evasive action—such ethics are called upon, and these situations are far more common than the highly contrived “swerve left or swerve right” situations stated in classical trolley problems.

To be fair, choosing deontology in this context does have several advantages:

It has far fewer free variables to consider than any consequentialist theory. Those free variables it does have can be determined by consulting traffic law (since the law is also deontological).
As it is in accordance with the law, it can be easily defended.
Maybe a consequentialist approach would make the public uneasy; maybe it would remind them more of the “robots” depicted in dystopian fiction; maybe a majority of people in many countries have deontological beliefs themselves.
Finally, and perhaps most compellingly, deontological frameworks tend to be much less sensitive to error than consequentialist frameworks. That is, if a self-driving car’s model of reality comes to represent a state that is both highly incorrect and unexpected (due to anything from faulty hardware to adversarial examples to deliberate corruption), a deontological algorithm will predictably come to a halt, whereas a consequentialist algorithm could take arbitrary radical action (which would, if the machine’s model of reality were correct, be the “safest” action in that imagined situation).

However, I do think it is worth making this choice deliberately. Witnesses to yesterday’s accident made statements like “the shuttle didn’t have the ability to back up out of the way,” showing a tacit expectation that of course the system designers’ goal would not have been strictly to halt in any unexpected situation. Perhaps, without inviting the full complexity of a consequentialist theory of ethics, a more refined set of rules could be constructed, that would at least cover relatively common situations like “getting out of the way” when another vehicle is coming towards you relatively slowly and no other vehicles are around. While fail-stopped is a working definition for safety in many engineering situations, it should not be applied blindly.

Fail-stopped is not fail-safe: autonomous systems and trolley problems

The fail-stopped assumption

Trolley problems

What to do?

Written by davidad