An AI’s Guide To Utilitarianism

Benefitting humankind: A how-to guide for superintelligent AI

Jonathan Davis
Geek Culture
10 min readJun 22, 2021

--

Photo by Andy Kelly on Unsplash

In his seminal paper, Human-Compatible Artificial Intelligence, Stuart Russell talks about “superintelligent AI” as an existential risk to humankind. One of the reasons, explains Russell, is the difficulty in defining a set of objectives for a machine, more intelligent than ourselves, that results in “beneficial outcomes for humans”.

In 1960, the American mathematician and philosopher Norbert Wiener wrote,

“If we use, to achieve our purposes, a mechanical agency with whose operation we cannot efficiently interfere . . . we had better be quite sure that the purpose put into the machine is the purpose which we really desire.”

In Douglas Adam’s The Hitchhiker's Guide to the Galaxy, the supercomputer Deep Thought works for seven and a half million years to calculate that 42 is the answer to the ultimate question of life, the universe and everything. In this case, poor instruction led only to a colossal waste of time. But imagine what damage a misinstructed superintelligent AI could do given enough autonomy.

The challenge is, how do we know what something more intelligent than ourselves is going to do? How is it going to act upon the purpose that we give it?

In order to create such AI responsibly, we must consider not only what the right (or ethical) thing for it to do is, to ensure the AI benefits humanity. We must also consider how we should programme, or teach, this to what is essentially an algorithm but, crucially, will have the ability to outthink humans.

Defining Utilitarianism

The distinction between beneficial outcomes and desired purpose, used by Russell and Wiener respectively in the above quotes, is very important.

In any introduction to moral ethics, you will generally be taught there are two key groups of ethical thought:

  • Consequentialist — whether an act produces the desired outcomes
  • Deontological — whether an act is, in and of itself, moral

Therefore, ensuring beneficial outcomes could be considered a utilitarian approach. Fulfilling some desired purpose could indeed also be utilitarian if the purpose is to cause beneficial outcomes. But it could equally be deontological if the purpose is to perform moral acts.

Photo by Tingey Injury Law Firm on Unsplash

In the first instance, it may seem like deontological ethics is a simpler solution to implement for AI. Unlike utilitarian ethics, it does not require a unique decision for every situation where the outcome differs. The rules could simply be hardcoded for each act.

Indeed, many philosophers argue that deontological frameworks, such as Kantian ethics, provide AI with a “practical philosophy that is relevant and applicable to achieving moral conduct”. However, when we consider that this involves explicitly considering the morality of every possible act an AI could perform, it becomes much less practical.

An alternative would be to allow the AI to decide for itself whether or not the action is moral. However, even authors who have positively discussed deontological frameworks for ethical AI have stated

At the current stage of research, the application of ethical principles remains a task for the human program”.

This being the case, perhaps consequentialism is a more practical approach. Instead of having to define a rigorous set of rules, AI only needs to apply some measurement to the possible outcomes. This is analogous to a standard machine learning framework, where an algorithm is given a target variable and metric by which to measure its performance at predicting this.

However, like with deontology, this raises many practical questions. The first of these is, what is considered a positive outcome? In this article, we will consider utilitarianism, the most popular consequentialist theory, often described by the mantra “The greatest happiness for the greatest number”.

The greatest happiness…

Most commonly, the 18th-century philosopher Jeremy Bentham is considered to be the father of utilitarianism. In his book, An Introduction to the Principles of Morals and Legislation, he discusses maximising utility, which he defines as,

“…that principle which approves or disapproves of every action whatsoever according to the tendency it appears to have to augment or diminish the happiness of the party whose interest is in question…”

However, the word “happiness” is somewhat obscure for an AI that must see the world in precise, quantifiable terms. So we must start by asking the following two questions:

  1. What is happiness?
  2. How do we measure happiness?
Photo by Szilvia Basso on Unsplash

What is happiness?

In the 1956 edition of his book, The Open Society and Its Enemies, Karl Popper uses the term “negative utilitarianism” to describe the concept of minimising human suffering instead of maximising pleasure.

In his response to Popper, R. N. Smart famously argued that a ruler with the means of painlessly killing his subjects would be duty-bound to do so to minimise possible future suffering.

Although one would not necessarily arrive at the conclusion that maximising happiness is equivalent to minimising suffering, it illustrates the dangers of providing AI with an ill-defined target outcome. Given the task of minimising suffering, a superintelligent AI might go on to find the most painless way of wiping out the human race.

But even in the more conventional sense, it is difficult to define happiness. However, according to the Cambridge Dictionary, to measure is “to define the exact size or amount of something”. There is not necessarily a requirement to define something to measure it. So, perhaps we can ignore the philosophical question of “what is happiness?” and instead define a measurable target outcome.

How do we measure happiness?

In classical economics, humans are considered to be rational agents. That is, they will pursue the act with the optimal expected outcome. If we assume that, generally, humans want to be happy, then we can conclude that the optimal expected outcome is that which, in the overall sense, makes us happiest.

Therefore, by observing humans, a superintelligent AI could gather data to learn what acts make us happy. This is similar to the way any other machine learning model learns from historical data. Given enough data, the AI would learn what acts it should do to maximise happiness.

Having seen that Sally chose the salmon last time she went to meet a friend for dinner at a steak house, an AI could learn that avoiding eating meat makes Sally happy and perhaps when doing her shopping it should avoid the meat and poultry aisle.

Of course, there are plenty of assumptions involved here. The field of behavioural economics describes the concept of cognitive bias, systematic errors made by humans when processing information. This essentially disproves the rational agent assumption.

As well as this, it is a significant assumption that the optimal expected outcome is the one that causes the greatest happiness. For example, someone might prioritise the economic implications of an outcome, but many would argue that “money can’t buy happiness”.

Photo by Aziz Acharki on Unsplash

But can we expect AI to know us better than we know ourselves? If we ourselves do not always follow the path of greatest happiness, can we expect AI to do it? It can only learn from our example, seen in the historical data it gathers from observation!

One alternative is to let the AI decide for itself how to maximise our happiness. We are, by the definition of superintelligent AI, discussing an entity that is more intelligent than us. Perhaps it will find a way to measure our happiness that we haven’t thought of? However, the danger here is that providing this level of autonomy could lead to something similar to the catastrophic results of negative utilitarianism. We just don’t know what the AI will do!

…For the Greatest Number

We can see that creating an AI to maximise utility is not a simple task. However, let’s assume the existence of an AI that can quantitatively and accurately predict the change in the utility of an individual caused by an act. We’ll call it Alan.

Personal AI

Alan is loaded onto a high-tech robot and given full autonomy in order to serve its owner, Sally. Alan is fantastic around the house and always chooses the right acts to maximise Sally’s utility. Then one day, Sally decides to take Alan out to help with the shopping…

Whilst in the supermarket, Alan spots a child staring at a toy longingly. He takes out Sally’s purse and hands a $20 note to the child. Sally looks disapprovingly at Alan, who explains that the child increase in utility after receiving the toy outweighs her decrease in utility at losing $20.

Photo by Vitaly Taranov on Unsplash

A purist utilitarian might suggest that this is the moral thing to do if it increases overall utility. However, why would someone want to own an AI that makes decisions they disagree with? This highlights the complexity of moving from utility calculations involving one individual to those involving two, especially when ownership is involved.

Alan could be programmed to prioritise the utility of his owner, Sally, over other humans by always multiplying her utility by some constant factor. However, what should this value be? After their visit to the shops, Sally and Alan head home…

Alan spots an oncoming car about to hit an individual who stole Sally’s phone last week and nearly put her in hospital! On top of this, in order to save them from the oncoming traffic, Alan would have to drop the two-dozen eggs he is carrying for the meringue Sally is serving at her dinner party this evening.

Photo by Michael Jin on Unsplash

In this case, most people would suggest that Alan save the individual from severe injury and pain. However, depending on the extent to which Sally is prioritised, her increase in utility (multiplied by some prioritising factor) may outweigh the decrease in utility of the individual in the path of the car.

Therefore, in a situation where multiple humans interact with and own, superintelligent AIs, significant thought would need to go into defining the extent to which the AI should prioritise their owners.

Public AI

We have seen that there are several issues with privately-owned superintelligent AI. The AI may make decisions that cause a significant decrease in overall utility, such as deciding not to save someone from oncoming traffic. And at the other extreme, why would anyone choose to invest in their own superintelligent AI if it will make decisions that benefit others more than them? This is similar to the free rider problem in economics.

One solution is the creation of government-funded AI robots which roam, ownerless, around the world maximising utility. However, there are still complications with this solution if we tweak the above scenario…

Sally has sadistic tendencies. The utility she would gain from watching her theif get hit by a car would outweigh the decrease in utility caused by pain and suffering to that individual.

In this case, even without any preferential treatment, the outcome that maximises utility seems not to be an ethical one. As in the case of Sally’s $20 note, one could argue that a purist utilitarian would argue that this is the moral course of action. However, just as equally one could argue that Sally’s sadistic tendencies mean that it is not fair or just to compare their two utilities.

This outcome may not sit right. However, one could convince themselves that Alan hasn’t actually done anything wrong. He has merely been passive, avoiding doing an act that would result in a slightly lower overall utility.

In response to this, consider the following scenario…

Sally enjoys watching the suffering of others. Alan can maximise overall utility by kidnapping people and torturing them in front of her.

In this case, it is difficult to argue that a purely utilitarian approach has led to the optimum outcome. Although this is an extreme situation, it raises some serious doubts as to the efficacy of a utilitarian approach to AI decision making in a world where people’s utility are at odds with each other.

Final Thoughts

Consequentialist theory is extremely popular, as people often see the merit in their actions through the consequences of them; how they, and other people, feel and experience their lives. In terms of optimum outcomes, happiness is extremely appealing, leading to the popularity of utilitarianism.

Photo by Bradyn Trollip on Unsplash

However, we have seen that creating an AI that can operate within a utilitarian framework is a significant challenge. The nature of digital computation requires that utility be measured quantitatively, without the involvement of any human intuition or moral judgement. But utility is hard to define, let alone measure. And then there are the practicalities of bringing utilitarian AI into the real world, where utility calculations must take into account all of humanity!

Although these are discussions that have permeated utilitarian thought for centuries, they become even more prominent when imagining the potential good and bad repercussions that superintelligent AI could cause to the human race.

If we are going to responsibly create superintelligent AI, we may need to build a new moral framework that can not only address these complexities but can also apply them to what is essentially an algorithm, only accepting ones and zeros!

--

--