AI and the “Trolley Problem” Problem

The striking ascent of self-driving cars, from the stuff of sci-fi to a car dealership near you, offers one of the most transformative examples of the impact of artificial intelligence on society. Cars are a technology that almost everyone in developed societies will use at some point in their lives (whether as driver or passenger), and so the prospect that everything from pick-up to pull-up will be fully automated will likely prove one of the most fundamental transformations of this adolescent century.

The adoption of earlier inventions like email and smartphones was hastened by their seeming similarity to existing technologies like (snail) mail and (analogue) phones, and the fact that self-driving cars seem to be simply “cars that can drive themselves” will no doubt serve a similar purpose. (This does leave open the question of how we’ll describe conventional cars in future: “pedal cars”?)

Yet just as with email and smartphones, looking “under the hood” of self-driving cars reveals a much more sophisticated set of functions and potential abilities than their conventional forebears. Yes, self-driving cars will drive themselves, but they’ll likely be able to do so much more as well. They might “intelligently” decide when to arrive at your house for your morning commute, based on the contents of your calendar that day. They might know (from your fridge — or your glucose monitor) which groceries you’re low on, and autonomously go fetch some (just in case the thirty minute wait for an Amazon drone proves too onerous). In a similar sense, even the first iteration of the iPhone — which didn’t yet have an App Store — nonetheless had extraordinary potential as a multimedia device due in part to its connectivity.

As well as these exciting new possibilities, however, self-driving cars will also have to go out of their way not to do something: kill people. 35,000 people died on American roads last year — probably more than those killed by guns, and certainly more than those killed by phones of any sort (though cars and smartphones have proved an especially dangerous combination.)

The attempts to fully automate such a lethal technology have given not only inventors but also regulators, academics and journalists much to ponder, to a far greater extent than with earlier consumer technology breakthroughs. By far, the question receiving the most prominent discussion is the so-called “trolley problem”. This thought experiment is a longstanding ethical paradox. Borrowing Wikipedia’s summary, the problem states:

There is a runaway trolley barreling down the railway tracks. Ahead, on the tracks, there are five people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a different set of tracks. However, you notice that there is one person on the side track. You have two options:
Do nothing, and the trolley kills the five people on the main track.
Pull the lever, diverting the trolley onto the side track where it will kill one person.
Which is the most ethical choice?

This problem gets to the heart of some of the oldest debates in moral philosophy, not least the divide between consequentialist and utilitarian approaches — which seek to optimise the “greatest good for the greatest number” and emphasise the impact of someone’s actions — and deontological ethics, which hold that participation in some action might always be wrong, with proportionately less regard to the consequences.

The problem has been extended to include a related case in which, instead of flipping a switch, the only way to save the five people is to push a nearby man off a bridge (since he is large enough to stop the trolley in its tracks). Research suggests that this alternative scenario causes many people to change their mind: many people are comfortable flipping the switch but not shoving the man. Though this implies a moral distinction between these two acts (despite their identical effect: sacrificing one life to save five), research suggests that the divergent attitudes ultimately result from a neurological distinction, as different parts of subjects’ brains were observed as controlling the decisions in the different cases.

Creating a moral motorist

In many ways it is easy to see why the trolley problem has become the canonical example in thinking about self-driving cars. Given the death toll that human-operated cars deliver every year, it is safe to assume that cars will continue to be at least somewhat hazardous, even with the vast improvements in efficiency that automation might bring. It follows, then, that we should think about the road-based equivalent of this track-based trolley problem as a matter of urgency, deciding whether and how to code societal values into autonomous vehicles.

Indeed, scientists at MIT’s Media Lab have launched an impressive attempt to experiment with just this question. The Moral Machine platform invites users to judge a series of hypothetical scenarios, making difficult decisions about the direction an out-of-control car should swerve. After answering a series of questions, the survey will rank a user’s implied “preferences” with almost disturbing granularity, in terms of gender, age, wealth, health, and much else.

The Moral Machine effort is laudable, and its significance self-evident. It serves to underline that the rise of automated technology, and specifically artificial intelligence, may have the unintended positive effect of encouraging society to be more open and explicit about those values. Code — whether in the form of law, or technological architecture, per Lessig — necessitates clarity, at least of a sort. Enabling a machine to make decisions demands declarations of our more fundamental values on which those decisions should rest.

And yet, the prevalence of the trolley problem in the context of self-driving cars is itself, in a sense, problematic. At the most basic level, we are talking about self-driving cars on roads, not trolleys on a track, which opens up a far wider array of possibilities for aversive action — including skidding in any direction, rather than just taking one course or another, as in the trolley example.

Since the trolley problem is purely hypothetical, it can be easily adapted for other contexts — the Moral Machine project, of course, adapts it for the road. But the basic simplicity that makes the problem so popular — the trolley or car can swerve left or right , so our decision is both binary and binding — is also what makes it problematic, in terms of reasoning about AI.

By asking ourselves what we would do when faced with such an ethically thorny issue, we risk ascribing to AI a “thought process” that it doesn’t really have. The trolley problem is an ethical paradox, which forces us reflect on our own values and biases. Though the fictitious problem involves the subject making a quick decision, the exercise is useful precisely because it shows how hard making such a decision would be in practice. Paradoxes that are easily solvable are not worthy of the name.

Machines are less prone to introspection. A self-driving car would be able to execute a “decision” in milliseconds, but its decision-making process is unlikely to operate much like our own. True, self-driving cars take on a wealth of data from the surrounding argument using cameras and radar much as human drivers do using eyes and ears, and true, neural networks — which are designed to mimic the human brain — can be used to help recognise objects and even predict pedestrians’ movement.

But these similarities are analogical, not biological. To the extent that an ethical preference (e.g., swerve into the dog to avoid the group of schoolchildren) can be coded into self-driving cars, the decision-making process would not be “rational” in a way we would understand. No ethical considerations would be in play during the making of the decision; rather, the decision would result from a set of pre-existing preferences implemented by coders — perhaps informed, as in the Moral Machine project, by the wider public.

This is not to say that a decision with ethical consequences which is processed by a machine would necessarily be worse than one made by a human in a typical crash situation. In fact, it might well be better: the self-driving car would be drawing on pre-existing preferences to swerve this way or that, and so hypothetically, if these preferences were developed equitably, ethically, and with the input from a representative group of people, they might be more reflective of a collective human will than a split-second decision taken by an individual person, “programmed” with all her specific biases and particular life experiences.

This is almost certainly naive, as technologies very often reflect the intentions and even the characteristics of their designers: consider, as an extreme example, facial recognition software, which can fail to recognise darker faces as effectively as lighter ones.

Regardless of the answer, the question is the wrong one to ask.

If, as the available data suggest, self-driving cars are much less likely to be involved in fatal accidents, only the steeliest utilitarian would argue that means we should worry less about these situations. But it does suggest that the trolley problem is the wrong lens through which to think about self-driving cars and the artificial intelligence which will power them. Not only does the trolley problem give the wrong impression of how automated decision-making works, but it limits our focus to a very specific set of hypothetical possibilities, potentially distracting us from far weightier challenges.

Road rage against the machine

The most important of these is the possibility of deliberate manipulation. At a time when organisations ranging from banks to political parties to cable TV networks can be hacked seemingly at will, the potential for foul play in the software used to safely get people from A to B seems alarmingly plausible. Alluding to the fact that these hacks tend to have a financial motivation attached, the influential academic and writer Zeynep Tufekci recently posed a far more pertinent and problematic question:

In contrast to the trolley problem, Tufekci’s hypothetical far better reflects the actual, near-future dangers of self-driving car technology. Though on occasions self-driving cars may face versions of the trolley problem when in the wild, the far greater danger is that their systems might be wilfully hacked by greedy humans.

Nor does hacking need to target self-driving systems directly. A new paper by researchers from four American universities shows that self-driving cars can be quite trivially manipulated by their surrounding environment. By subtly altering road signs using stickers, which might look like random vandalism, the researchers were able to fool autonomous vehicles into erroneously perceiving a stop sign as a speed limit sign.

It’s easy to see how the consequences of these sorts of attacks could be devastating, without the need to even target the vehicle’s software directly. While specific fixes will surely be developed, these problems — we might call them the “cliff-edge problem” and the “stop sign problem” — are in reality far more dangerous to the safe operation of self-driving cars than the traditional trolley problem.

As a technology, self-driving cars present enormous opportunity for safe, productive travel. The fact that they are, on aggregate, much safer than human-operated vehicles shouldn’t preclude discussions about how to “teach” such machines to handle ethically complicated situations, including the values they should draw on when these arise. (And these reflections might prove as useful for our own ethical conduct as for that of the cars themselves.)

But in contrast to consumer devices like the telephone, the technology that self-driving car designers hope to transform is already extremely dangerous — due in large part to everyday negligence and incompetence on the part of human drivers. As the “cliff-edge” and “stop sign” problems demonstrate, the core danger of self-driving cars is not what happens when we improve on human incompetence, but instead what happens when we enable human malevolence.