We’re programming A.I. psychopaths — and how to avoid it

7 min readJun 27, 2018

A number of people —including Elon Musk, Stephen Hawking, and Henry Kissinger — have issued dire warnings about artificial intelligence. Are we running headlong into disaster, or are these warnings exaggerated?

In a new academic article appearing in the peer-reviewed journal A.I. & Society, I argue that current approaches to programming ethics A.I. are indeed deeply flawed, and a different programming approach is necessary to ensure that machines behave morally.

As Kissinger points out, there are at least two problems with existing approaches to A.I. programming:

The problem of unintended results (the ‘black box’ problem): no one really knows how to predict what A.I. algorithms will do.
The problem of interpretation (or ‘context’): “the danger that AI will misinterpret human instructions due to…inherent lack of context.”

Kissinger illustrates these problems as follows:

A famous recent example was the AI chatbot called Tay, designed to generate friendly conversation in the language patterns of a 19-year-old girl. But the machine proved unable to define the imperatives of “friendly” and “reasonable” language installed by its instructors and instead became racist, sexist, and otherwise inflammatory in its responses…Can we, at an early stage, detect and correct an AI program that is acting outside our framework of expectation? Or will AI, left to its own devices, inevitably develop slight deviations that could, over time, cascade into catastrophic departures?

In my article, I argue that existing approaches to A.I. programming cannot solve these problems, and fail in a way that can only be expected to produce ‘psychopathic’ behavior.

To understand why, consider two different ways a person or A.I. can ‘behave like a psychopath.’

First, one can act in ways that display an absence of moral conscience — that is, in ways that display little or no regard for the well-being or freedom of other people. This is the kind of psychopath most of us familiar with: the ‘cold-blooded killer’ who tortures animals and kills people for (serial killers like Ted Bundy, Jeffrey Dahmer, and Patrick Bateman from American Psycho).

However, this is not the only way to ‘behave like a psychopath.’ As the TV series Dexter and Avengers: Infinity War illustrate (SPOILER ALERT), psychopathic behavior can be cause having too much of a conscience, acting on ‘moral principles’ in overly strict or zealous ways. In Dexter, the main character kills people on moral grounds — to punish the guilty for their wrongdoing. And in Avengers: Infinity War, Thanos aims to exterminate half of all life in the universe to make a ‘better world.’

In my paper, I show how current approaches to A.I. programming are likely to produce both types of ‘psychopathic’ behavior. If, on the one hand, we program A.I. without a clear ‘moral target’ (moral motives or principles), A.I. will be ‘psychopaths’ in the first sense, lacking an appropriate ‘conscience.’ However, if we do program A.I. with moral motives or principles, we need to make sure that they do not act on those motives in an overly strict or overzealous manner (like Dexter or Thanos).

The problem, however, is that current approaches to A.I. fail to solve this problem: Kissinger’s problem of interpretation. As I will now explain, current programming approaches can be expected to lead A.I. to interpret moral concepts and principles either too strictly or too flexibly.

In order to behave morally, A.I. must interpret moral concepts like ‘harm’, ‘offense’, and so on. To take a simple case from the 2004 film I, Robot, for A.I. to obey a principle, “Do not harm humans”, the A.I. must interpret each relevant concept (What is harm? What is a human? Is a fetus human? Etc.).

Here, though, is the problem. Current approaches to A.I. programming either program strict interpretations into A.I. (e.g. harm = anything causing pain) or flexible/learned interpretations (leaving it to the A.I. to ‘decide’ how to interpret the relevant concepts). The problem is, neither approach can work.

First, consider what it would be for a machine to follow a strict, completely inflexible rule to “never cause harm”, where this means never causing pain or physical damage to humans. A machine following this rule would never pinch a person (causing minor pain), even if doing so might prevent great evil (say, the end of the world). The basic problem here is that strict interpretations moral rules are the wrong target. Morality, as humans understand it, isn’t a set of strict rules. Rather, there are contextual exceptions to every rule. For example, it generally is wrong to pinch people against their will or cause them harm. However, if one had to choose to pinch a single person to save the world, many of us would think it would be the right thing to do. So, we need program A.I. to be sensitive to context.

Some theorists appear to think we can program sensitivity to context by polling human beings about how to best respond to different scenarios or contexts, and then programming A.I. to respond to scenarios the way most humans prefer. However, this is exactly wrong. Part of the problem here is that human beings disagree tremendously over what is right in different scenarios. For instance, when it comes to self-driving cars, some of us want them to save as many lives as possible; some of us want them to save us or the ones we love; and so on. Further — and crucially — morality is not a matter of simple majority opinion.

We can see why by looking at an example from I, Robot, where Detective Spooner (Will Smith) comes to hate A.I. due to the way in which they robotically and inhumanly follow the programming a majority of people prefer. In one scene, an AI robot saves Spooner’s life instead of the life of a young child — all because a majority of citizens supported the idea that A.I. should maximize probability of survival. The problem is, it isn’t the majority who were in that situation. It was Spooner and the girl whose lives were at stake, and Spooner wanted and believed the girl should be saved.

Cases like this are important. They illustrate why morality is not a simple matter of ‘whatever the majority says, goes.’ Once upon a time, a majority of people supported slavery; and in some societies, human sacrifice. Majority support doesn’t make something right. No, morality is a matter of justifying one’s actions to those one’s actions affect — and in every context, there are different people one’s actions affect. So, to program A.I. to be ethical, we need to program them to be sensitive to these contextual factors. But how?

The obvious answer is to program A.I. to apply moral concepts or rules in a more flexible manner — to ‘decide for themselves’ who to help or save, much like we do. However, the opposite problem arises here: how can we be sure we are programming in the right amount of flexibility?

It’s clearly vital not to program “too much” flexibility in, for then A.I. can interpret moral principles wrongly. For instance, in I, Robot the A.I. interprets the programming law “do not allow harm to humans” as requiring it to enslave humans for our own good (because we wage war, kill, murder, rape, etc.). Clearly, this is too much moral flexibility. So, again, we come to the problem: how do determine the right amount? In my article, I argue there is only one plausible way to do so: we need to program A.I. to solve the problem of context/interpretation the way we solve it. So, how we solve it?

In my 2016 book, Rightness as Fairness: A Moral and Political Theory, I argue that morality emerges from a form of risk-avoidance typically learned in adolescence. In childhood and adolescence, we are liable to take ‘dumb risks’, such as stealing candy, or cheating on tests, etc. Then, most of us learn to avoid these risks (as ‘risks not worth taking’). Why? The answer is: we learn to avoid these risks because, during adolescence, the risks often do not go our way: we end up in the principal’s office, or grounded by our parents, or simply feelings of guilt— things that lead us to regret the risk and avoid similar risks in the future.

I then argue that this kind of risk-aversion makes it rational to care about how other people affected by our actions respond. After all, one reason we regret ‘dumb risks’ is social punishment; another reason we regret ‘dumb risks’ is we feel guilty afterward (“I feel so guilty about how I hurt her”); and so on. I argue this kind of risk-aversion is key to solving the problem of moral interpretation in A.I. For why do you or I not interpret a moral principle like “protect people from harm” as permitting the dangerous interpretation “Enslave humans for their own good”? The answer is that any person of conscience would worry they might regret this interpretation (“I know enslaving the human race might ‘protect’ humans from each other — but goodness, I really might regret trying to do that!”). Indeed, the A.I.’s error in I, Robot is precisely this: she does end up ‘regretting’ her inhuman interpretation, precisely because humans resist and destroy her. Had VIKI had our human-form of risk-aversion, she would not have interpreted ‘protect humans from harm’ in such a radical way. She would have cared instead about how we might respond to her interpretation (viz. resistance).

What worrying does, I argue, is ensure that we interpret moral concepts and requirements in ways that respond to those around us in any given context — the specific people whose lives we might ‘harm’ or ‘help’ in the circumstance at hand. So, for instance, if I were in the place of the A.I. who had to choose to save Detective Spooner or the child in I, Robot, I would not simply save the person who had the higher probability of living (as the A.I. does ‘inhumanly’). No, I would hear Detective Spooner’s pleas (“Save the girl!”) and worry — in a human manner — that if I did not respect his his moral preference (viz. “save people from harm” means saving the girl in this context), I might regret it because Spooner would regret it if I let the girl die to save him.

In other words, worrying about the future in the way I describe (worrying about how our interpretations of moral concepts might affect those around us) is precisely what leads us to interpret moral concepts in ‘human’ rather than ‘psychopathic’ ways. It leads us to interpret moral concepts flexibly but not too flexibly, by leading us to (A) care about how others might respond to our moral interpretations, and (B) avoid risky, dangerous interpretations.

Unfortunately, this is precisely how AI aren’t being programmed today. If I am right, we need to fundamentally change how we approach programming A.I.…before it is too late.

We’re programming A.I. psychopaths — and how to avoid it

Written by Marcus Arvan