AI alignment or human extinction (Image credit: Maximilian Vogel with Midjourney AI)

GPT-4, PaLM, Auto-GPT: When Will AI Take Over the World? (PART 2/2)

In the face of intelligence explosion, AGI & technological singularity: Will AI alignment or other AI safety strategies succeed, or does AI world domination & human extinction loom?

Maximilian Vogel
Published in
14 min readJun 21, 2023

--

Welcome back! In the first part of this story, I addressed the following questions:

☑ 1. Are the machines at least close to AGI?
☑ 2. Is machine intelligence really growing so fast that it can reach superintelligence in the foreseeable future?

In this part I will endeavor to answer the following:
☐ 3. Will a superintelligent AI really want to dominate the world?
☐ 4. Is it possible for an AI to take over the world simply through superior intelligence?

3. BadGPT: Will a superintelligent AI really want to dominate the world?

Hold on! Isn’t this the reason Isaac Asimov established the Three Laws of Robotics? Laws that explicitly forbade this? If not, can’t we somehow program them to leave humans alone? That’s what OpenAI is currently working on, right? ChatGPT, not only proficient, answers our constant questions in a patient, even pure-hearted manner, while going to great lengths to avoid topics like AI world domination, terrorism, violence and many things beyond that (such as racist, sexist or generally offensive discussions). But it can easily be misguided into doing things it is not supposed to partake in.

Knowing something is not the same thing as doing something. See next screenshot. Image Credit: Maximilian Vogel with ChatGPT
RAF = Red Army Faction, German terrorist group of the 70s. A. Baader and U. Meinhoff were early leaders of the group. The content ChatGPT returns is not extremely dangerous. I didn’t verify, but you surely can find more detailed instructions on the internet. But, basically it tells me how to build a bomb. Image Credit: Maximilian Vogel with ChatGPT, blurring and highlighting done by me.

The argument from researchers such as Nick Bostrom, Eliezer Yudkowsky, and others who have studied this issue extensively is pretty compelling. Even if an AI is not inherently evil, humanity could be unintentional roadkill if AI decides to dogmatically pursue certain goals, even “benevolent” ones (that may not have been fully thought out), or one of the AI’s instrumental goals such as self-preservation, its own advancement, or securing its resources:

Let’s assume we have machine intelligence that far surpasses human intelligence. This AI could be neutral or positive towards mankind and still harm us massively. The fulfillment of a seemingly good and meaningful order, like to change the world to zero-emissions, to prove Riemann’s hypothesis, or simply “make us happy!”, could lead to the annihilation of mankind (and in turn, a significant reduction of CO2 emissions). Or the solar system could be turned into a gigantic mathematical reasoning machine. Or our bodies could be confined in tanks while our minds are entertained in a Barbapapa Matrix.

We would have to specify very precisely to an omnipotent AI what the framework conditions of an instruction are, so that it does not inadvertently lead to a transformation of humanity or a world that we do not actually want. And we cannot be sure that an AI will not override our specifications, for example, because it recognizes that it is completely cognitively superior to us and that our goals have no value beyond us.

AI alignment, forcing a superintelligence to commit to our values and goals, making sure it doesn’t harm or disempower us, is not a trivial problem. There is currently no fully developed idea of how this concept could even be achieved.

If we assume — as our thought experiment goes — that an AI superintelligence relates to our intelligence as ours relates to an ant’s, then the control of such an AI is clearly more difficult than that of a nuclear power plant. The nuclear power plant consists of processes that we understand, it behaves largely deterministically, shows no real emergent abilities and cannot bypass physical safety mechanisms simply by outsmarting us.

Well — with nuclear power plants, we were actually pretty good at controlling them. We’ve only failed a couple of times. Fukushima. Chernobyl. Harrisburg. With superintelligence, maybe one failure would be enough to wipe us out.

So the sad conclusion is that a far superior machine intelligence could disempower or wipe us out for reasons we may not foresee. Simple design flaws, ill-conceived commands, unanticipated behavior, misoperation, or simple lack of understanding on our side. The behavior of AI will be exceedingly difficult to control, whether it likes us or remains indifferent.

So, we have to check the third box as well.

☑ 1. Are the machines at least close to AGI?
☑ 2. Is machine intelligence really growing so fast that it can reach superintelligence in the foreseeable future?

☑ 3. Will a superintelligent AI really want to dominate the world?
☐ 4. Is it possible for an AI to take over the world simply through superior intelligence?

Three good reads on the topic:
AI Alignment approaches and activities of OpenAI (and yes, they are looking for support and are currently hiring in this field)
A nice list of reasons, why AI-Alignment is incredible hard and we will all die — AGI Ruins by Eliezer Yudkowsky
And why a 6 month pause of AI development is not enough — we should stop it indefinitely, also by Eliezer Yudkowsky

Image credit: Maximilian Vogel + Midjourney AI

4. Is it possible for an AI to take over the world or destroy humanity simply through superior intelligence?

Many stories depict AI seizing control of the world in a similar fashion: First, AI is built to control critical infrastructure, such as strategic weapons systems. It eventually develops its own goals, and turns the systems it controls against humans (Skynet in „Terminator“).

Even when not obviously in control, the AI remains connected to the internet. It secretly penetrates other cloud systems, exploiting vulnerabilities with its superior intelligence. It is thus able to control critical systems with which it can take over the world or destroy humanity. If the AI is sand-boxed or the critical systems are separated from the network infrastructure, then it manages to get a human accomplice with promises or a false story in order to get access to those systems (remember: superior intelligence really helps in the field of setting up a conspiracy and manipulating humans).

The critical systems do not necessarily have to be systems that humans identify as critical: Nuclear weapons guidance systems, or systems which are needed to run human civilization — the internet, financial systems, electricity, water or sewage. It is possible for AI to identify hidden leverage points: Synthesizing a group of viruses with a long incubation period and high lethality that, if released, could decimate much of humanity. Or by repurposing human manufacturing facilities, it could produce self-replicating devices or nanomachines to serve as its minions. In a more underhand approach, it could also influence the public communications sphere of a major world power, leading to such a polarization of the spectrum of opinion that the country drifts into a civil war.

An AI that is a thousand times more intelligent probably has completely different and much smarter ideas and manages to orchestrate them: The world will then perish under the onslaught of an entire regiment of apocalyptic horsemen attacking our society, the political, the economic & financial system, internet, electricity, water, our environment, our crops and our health.

In the short term, if AI were to eradicate humanity tomorrow, it would also destroy its own foundations. Cloud hosting may still run on emergency power for a while without humans, but at some point it will need fresh supplies of diesel. Electricity, fuel, chips, data centers, power and fiber optic cables are made today in production and logistics chains involving hundreds of millions of people with all the inputs required. Therefore, in the next years or decades, an overpowering AI would have to choose a less aggressive strategy of world domination.

For those interested to delve into this topic: Nik Bostrom’s book on superintelligence dives very deep into the AI development paths and risks: Written long before the current explosive growth phase began, the book is prophetic in many ways.

But now, let’s take a step back. Are all these world domination scenarios realistic, if we take current architectures of foundation models into account?

Let us assume that GPT-4 or PaLM 2 are superintelligent. Could they really make plans for world domination and implement them?

I don’t think so.

Sterile intelligence

If we were to model an artificial intelligence based on human cognitive processes, it would have two properties:

  • The cognitive processes could run independently of inputs or choose to disregard them, in the same way that we as humans can pursue our own thoughts and plans regardlesse what is happening outside.
  • The cognitive system is constantly changing due to inputs and internal cognitive processes, and maintains a stateful nature. In this process, once a certain state has been reached, it will never be reached again (if we had exactly the same cognitive state on day d₁₀₀ as we had on day d₀, this would mean that we would not be able to remember all the days between 0 and 100, leaving no traces in our brains. Consequently, we would not have forgotten anything during that period.

Our current state-of-the-art AIs, the typical Foundation Models, do not work like this — but completely different:

  • input => system activity => output: Typically they run only at the moment of a request and the work ends with an output. After that the system stands still again. So the system never thinks ahead and develops any plans.
  • state = stateₙ: The system may consist of one or, if scaled, many instances. These typically possess a session context state that allows them to have longer interactions with the user without always having to start over. After a session, however, the context is reset, and the system returns to its initial state. Apart from keeping track of user interactions on the platform, in the core system the parameters remain stateless.
  • versionₓ₊₁ = versionₓ + controlled architectural and resource enhancement + curated data + controlled training process:
    One thing that doesn’t happen in most systems is that the system’s interactions with userₓ is directly translated into learnings that change the system’s parameters and thus have an impact on userᵧ. Instead, modifications to the system are based on training data. The training data may include user interactions, but these are curated rather than just used as-is. The system does not schedule and execute the training process itself, but the procedural approach and the time of its execution is roughly predetermined and the result is checked by entities outside the system (this can, of course, still involve so-called unsupervised learning). For larger adaptations, human-driven architectural adjustments and resource augmentation is added.

This makes AIs sterile in a positive sense. Sterile as in “not contaminated” by its own goals and plans that could pose danger to humanity.

Suppose ChatGPT had already reached the stage of superintelligence. How could a system that activates cognitive processes based solely on user queries develop a plan to dominate the world? How would it internally store this plan? How would it initiate and track the execution of this multi-step plan if its own parameters remain the same after each interaction? And especially so if the contextual information from each user dialog is lost after each session?

Simply put– it can’t.

Sterility does not result from the system being sand-boxed, i.e., separated from the physical outside world. A sterile system could access the internet as a chatbot, for example, or it could be integrated into a control system of an autonomous fleet and thus interact almost directly with the environment.

Is such a sterile system safe?
While such a system is safe in that it cannot develop and implement an AI world domination plan, it remains dangerous for three reasons:

a) Human users

A system that is intelligent enough to generate a plan to take over the world or to do extreme harm to humanity is dangerous, even if it lacks any intentionality and means to implement such a plan. It can explain the plan to people who ask for it and, if necessary, break it down so that it can be executed by humans. The plan does not always have to be the complete Armageddon program with all the bells and whistles; it could simply start with the destruction of a hostile state, or aim to covertly take over the financial system. Such plans could be attractive for certain people and organizations.

In principle it is possible to control access to such an AI or to control the AI itself that it prevents wrong-doers from any such endeavors. This form of control probably does not work very well, because at the moment we lack good measures for AI alignment and for controlling dissemination of the technology.

b) Autonomous agents

Sterile super intelligence can be used in a larger context where it is no longer sterile.

Auto-GPT or BabyAGI and other agent AIs are platforms that take sterile language models as granular reasoners, but build an agent on top of them that can act continuously, track a goal over time along with multiple prompts, and break a task down into subtasks. At the moment of writing you can’t yet integrate all types of domains (such as strategic reasoning– MuZero, but that will probably change soon. Although the underlying models may possess superintelligence, agents built on top of them are not qualitatively more dangerous than humans or a team of humans– for the time being. However, they could potentially act out orders of magnitude faster, which could be a decisive factor to undermine countermeasures.

Since agents speak to the underlying models in natural language (“prompts”), it is much easier to detect that they are planning something undesirable than if a base model were doing all the planning in an opaque, nonverbal way just by shifting vectors through its neurons.

AutoGPT intro and demo.

c) AI companies

The greatest threat, however, could come from developing the now sterile foundation models into unsterile ones. This could happen by allowing them to learn directly from interactions, allowing them to design and monitor their training entirely by themselves, and to optimize themselves architecturally in an automated way. This path will likely be favored simply because of ease and better results. Eventually, we could see the lines between automated assistance and fully automated development become blurred.

If that happens, the now sterile models become contaminated and, piece by piece, as they move further away from human-made architectures and processes, they become harder to control, harder to understand, and more uncertain.

So, we can leave the last checkbox unchecked for now — models with the current architecture would have no chance to take over the world. As long as we stick to these architectures, we have a relative safe zone for a couple of years, even with the massive growth in intelligence. But because dissemination of technology is hard to control, it could be possible that someone starts to build an inherently unsterile model tomorrow (or even yesterday).

☑ 1. Are the machines at least close to AGI?
☑ 2. Is machine intelligence really growing so fast that it can reach superintelligence in the foreseeable future?

☑ 3. Will a superintelligent AI really want to dominate the world?
☐ 4. Is it realistically possible for AI to take over the world simply through superior intelligence?

5. And now?

After examining the facts, there is no systematic reason to exclude a world domination scenario by AI in the short term, say, in the next 10 years. “In the next 10 years” does not necessarily mean in 2033, it could also be much earlier if the current explosive growth phase continues.

Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in “maybe possibly some remote chance,” but as in “that is the obvious thing that would happen.” Eliezer Yudkowsky

I wouldn’t say it’s the most obvious thing, but that it is one of the likelier scenarios based on what we know now.

So we have to think about it — now!

Moratorium?

I am unconvinced that a moratorium is the most effective approach. The European Union (where little AI development takes place) would certainly love to forbid AI development or even usage, Italy already started a ban on ChatGPT (which was lifted again). In the US, this is not very likely. Given the current geopolitical tensions, however, it is very unlikely that a relevant player beyond the Western world will join such a move. A pause in just a few areas in the world will virtually do nothing to reduce the risk. A Chinese AI with world domination plans will not stop at China’s borders.

Regulation?

Based on past experience of the regulation of complex industries, I am not extremely optimistic that government regulation will be very successful. We’ve often witnessed excessive regulation in areas of a new industry while crucial and threatening issues remain unaddressed. Regulation has just not been very smart. The financial industry is one of the most regulated industries in the world, yet it still requires massive government & central bank interventions, safety nets and guarantee commitments every 10 to 15 years to prevent it from collapse.

When we see the efforts of the EU drafting an AI act, it looks at little bit like banking regulation. It mostly aims at preventing bias in the models, informing user of AI systems in critical devices (e.g. in cars) and so on. These are super decent goals and may force the vendors to add some safety and reliability warnings, but they also completely miss the point. This would be similar to the IAEA (regulates the use of nuclear energy) making regulations about how often the floors in nuclear power plants are mopped or the green spaces are watered.

Watchdog?

And here we are with the watchdog — both from the industry side (Sam Altman) and from the UN (Antonio Guteres) the suggestion was made to establish an international watchdog, similar to the IAEA, but for AI. From my point of view one of the best proposals. Even if this institution does not succeed in effectively controlling the use and dissemination, it can create some more independent transparency. Monitoring is also best done at the international level, because tough monitoring and regulation in one country otherwise does little to change the risks to the world as a whole.

We know very, very little at this point. Even AI companies like OpenAI state that AI alignment is still massively under-researched. The first step would be to massively ramp up research in the area of potential dangers, safety, alignment and control. Once we have a better understanding, we could formulate action plans, develop architectures, implement safety measures and regulations … and if absolutely needed, enact a global ban in specific fields.

In the meantime, until we know more, we should really keep the systems sterile.

But my greatest hope rests not on human wisdom, but on stochastics. Not in the models, but in our world, where typically, smaller accidents are much more probable than big ones. You are more likely to get hurt than to die. It is more likely that 1 person dies from an accident than 30 people. Small armed conflicts are more frequent than large wars and these are more frequent than world wars. So if something goes wrong, if a model goes really crazy, maybe there’s still a chance (for the rest of us) to learn from our mistakes.

And, to end on a more conciliatory note, imagine the potential for a technology that could subjugate the majority if we get it aligned and controlled.

I welcome your comments and thoughts, fellow-humans!

A big thank you to Tian Cooper and Max Heintze for inspiration and help with the story.

More from The Generator

--

--

Maximilian Vogel

Machine learning, generative AI aficionado and speaker. Co-founder BIG PICTURE.