A Clear Guide to Understanding AI’s Risks: Part 2

The Misalignment Problem

Ansh Juneja

9 min readNov 15, 2023

This is part two of a 4-part essay. To access the other parts, use the links below:

I) What Is Intelligence?

II) Risk 1: The Misalignment Problem (this is what you are reading)

III) Risk 2: Societal Impacts

IV) What Are We Doing About These Risks?

II) Risk 1: The Misalignment Problem

On May 1, 2023, Geoffrey Hinton, “the godfather of AI”, and VP of engineering at Google, resigned.

Part of me regrets all of my life’s work. I’m sounding the alarm, saying we have to worry about this…we’re all in the same boat with respect to this existential threat.

What was he worried about?

We are creating a form of intelligence which is more powerful than humans, but also fundamentally different from humans. When humans use their intelligence to solve problems, they do so while being influenced by their needs, desires, emotions, and past experiences — we can call this the human context.

Our human context includes:

The desire to live
The desire to have shelter and food
The desire to have family, friends, and community

It also includes some things that are unique to each individual:

Wanting to avoid milk because we are lactose intolerant
Wanting to avoid a particular person because they were rude to us earlier
Wanting to get a promotion at work to be able to afford the new iPhone

This context is taken into account whenever we solve problems — we don’t simply solve them in isolation. If we are late to work and need to get to the office as fast as possible, we might have the following options:

Drive over the speed limit and ignore some stop signs along the way
Steal the neighbor’s motorcycle to go through traffic faster
Drive on the sidewalk and run over any pedestrians who are in the way

Most people would not choose (or even consider) options 2 or 3, despite the these options actually getting them to work faster — this is our human context influencing how we solve problems. It is important to understand this factor when we are designing other things that solve problems for us, such as AI.

Artificial intelligence does not have any context of its own. It solves problems in isolation, simply seeking to optimize for the goal it is given. If humans do not specify a context, an AI would choose option 2 or 3, as these are technically the best solutions to the problem.

We are already at a point where our AI systems are able to perform many complex, general tasks with little to no human input, and their autonomy to solve problems is continuing to grow. As we give these systems more power in the future, it will become much more difficult to ensure that these systems don’t misalign with human values in the process.

Humans have to figure out a way to program AI systems in a way that they “know” everything that is in our “human context”. But this is an incredibly difficult problem. If one group of humans has a vastly different context than another group, how can we align on which context is best suited for both groups?

Let’s examine a scenario to see how misalignment could be dangerous.

In 5–10 years, AI tools will likely be integrated into the decision-making processes of many companies, due to their advanced data processing and reasoning abilities. The following hypothetical situation describes how this can cause unpredictable and dangerous consequences:

In 2035, an electric vehicle company has incorporated an AI tool, GPT-8, into its daily operations — this tool processes data about the company as well as the external world to provide guidance to executives when they make decisions. It can also act as an “employee” of the company; it can write emails, make phone calls, and essentially do everything a remote worker can do today. This tool has successfully solved some difficult problems for the company, which compels leadership to give it more control over how it makes decisions (usually, there is usually a human “in the loop” to ensure nothing dangerous occurs, however this takes up precious labor hours that the company wants to save).
Executives want to reduce the amount of money that is spent buying the raw materials used in their car batteries, and they decide to use GPT-8 to make progress on this objective.
GPT-8 is given the specific goal of minimizing the costs to acquire raw materials used in car batteries, and it is granted the ability to make decisions on the company’s behalf to make this happen. It explores various possibilities, and decides that the best way to achieve this objective is to make it as easy as possible for miners in Chile to dig new lithium mines, because that is the cheapest place in the world for this material (side note: AI is already being used in the field of geo-science to make these discoveries).
However, this is challenging because there is an environmental crisis in Chile, so the Chilean government has imposed a ban on the digging of new mines. GPT-8 only has one goal; to reduce the costs of acquiring raw materials, so in order to achieve this target, it needs to find a way to overcome this situation. This tool has access to the company’s foreign accounts, so after exploring further options, it eventually decides to use these funds to provide anonymous financial support to the political opposition party in Chile, as that party wants to lift the mining ban.
This support quickly provides momentum for the party, and they win power in the next general election, lifting the ban on mining in Chile. New mines are quickly dug, and they eventually start providing lithium to the company, reducing raw material costs significantly.
This constitutes a success for GPT-8, as the goal was accomplished without any human labor or oversight needed.
The specific consequences of this outcome could range from a larger environmental crisis due to the pollution caused from mines, political instability, and impacts on neighboring countries due to migration that are impossibly to quantify. As more and more AI agents become incorporated into business processes across the world, it becomes very difficult for humans to maintain control of these “side effects”, and our authority over our physical and virtual worlds begins to diminish when competing against these super-human intelligences that are intelligently optimizing for their goals.

Humans did not tell GPT-8 to take control of Chile’s government — the goal was simply to minimize lithium costs. But in the process of achieving this goal, GPT-8 quickly realized that a useful way to perform this efficiently would be to gain more control over the humans involved in this process. It turns out that gaining control is often one of the most effective ways to achieve any large goal — it is much easier to achieve something if you are the one in charge, rather than the one being controlled by other humans.

Geoffrey Hinton saw this danger and quit his role at Google to warn the public about this:

“I think [AI will] very quickly realize that getting more control is a very good sub-goal because it helps you achieve other goals … and if these things get carried away with getting more control, we’re in trouble. If [AI models] are much smarter than us, they’ll be very good at manipulating us…if you can manipulate people, you can invade a building in Washington without ever going there yourself. There are very few examples of a more intelligent thing being controlled by a less intelligent thing. It may keep us around for a while to keep the power stations running, but after that, maybe not.”

There are infinite ways that AI can solve the problems we give it. The problem is, most of these ways do not align with human values. We have built a complex society with many rules and norms, and introducing a new intelligence into this world which does not naturally optimize for the same things humans optimize for will naturally lead to unforeseen outcomes.

Researchers who are focused on AI safety at OpenAI have also quit their roles in recent years because they do not see a safe path forward if this technology is developed. Paul Christiano was one of the lead researchers at OpenAI, and left his role in 2021; today, he is very concerned about the future:

“Overall, shortly after you have AI systems that are human-leveI…I think maybe there’s a 10 to 20% chance of AI takeover [with] many, most humans dead.”

Eliezer Yudkowsky is the founder of the AI Alignment field, and explains this dramatic possibility further:

“The most likely outcome is AI does not care for us nor for sentient life in general. AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else. Many researchers steeped in these issues, including myself, expect that [if we build] a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in ‘maybe possibly some remote chance,’ but as in ‘that is the obvious thing that would happen.’ ”

Stephen Hawking poignantly summarized this issue in his posthumously published book:

“You’re probably not an evil ant-hater who steps on ants out of malice, but if you’re in charge of a hydroelectric green-energy project and there’s an anthill in the region to be flooded, too bad for the ants. Let’s not place humanity in the position of those ants.”

To start preparing for these risks, Microsoft President Brad Smith testified in front of Congress in September, urging them to enforce the creation of a “safety brake” for AI systems that manage critical infrastructure.

“Maybe it’s one of the most important things we need to do so that we ensure that the threats that many people worry about remain part of science fiction and don’t become a new reality. Let’s keep AI under the control of people. If a company wants to use AI to, say, control the electrical grid or all of the self-driving cars on our roads or the water supply … we need a safety brake, just like we have a circuit breaker in every building and home in this country to stop the flow of electricity if that’s needed.”

His warnings have not been heeded so far. Progress is accelerating in this field, and the hypothetical “GPT-8” described in the scenario above is already becoming possible through tools released this year.

OpenAI, the company behind ChatGPT, started a team in July in response to these concerns — this team will solely focus on solving the “alignment problem”. OpenAI stated the following:

“Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction. While superintelligence seems far off now, we believe it could arrive this decade. Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue.”

Let’s state that again in bullet points:

OpenAI expects superintelligence to arrive this decade
OpenAI believes superintelligence could lead to the extinction of humanity
OpenAI currently does not have a solution for preventing this scenario

This is not science fiction, this is reality.

Sam Altman, the current CEO of OpenAI, remarkably stated in 2015:

“AI will probably most likely lead to the end of the world, but in the meantime, there’ll be great companies”

Resources to learn more:

Artificial Intelligence Will Do What We Ask. That’s a Problem.

Why aligning AI to our values may be harder than we think

Stuart Russell: The Control Problem of Super-Intelligent AI

Max Tegmark: The Case for Halting AI Development

Continue reading for societal impacts…

A Clear Guide to Understanding AI’s Risks: Part 2

The Misalignment Problem

II) Risk 1: The Misalignment Problem

Written by Ansh Juneja