An Introduction to AI Safety: AGI and Superintelligence

Neil Shaabi
Warwick Artificial Intelligence
9 min readJan 18, 2022

--

As the prevalence and competencies of AI-powered technology continue to grow at an unprecedented rate, it is becoming increasingly important to ensure that AI is deployed in ways that empower rather than undermine humanity. This challenge motivates research in AI safety, central to which is the concern that we may build artificial agents far more intelligent than humans, whose goals are misaligned with our own. It is widely believed that our dominance over other species can be attributed to certain cognitive abilities comprising human intelligence [1,2]. If we create a system with a level of intelligence that transcends ours, humans will be unable to control its actions and, by extension, the future of our species. For reasons such as this, many prominent researchers regard the development of a machine superintelligence as a serious existential risk [1–4].

Classifying AI

Before we can examine this argument more closely, it is necessary to clarify what constitutes intelligence. Though a precise definition of the term is yet to be presented, Legg and Hutter’s interpretation of intelligence as a measure of “an agent’s ability to achieve goals in a wide range of environments” [5, p.402] features prominently in the existing literature. This definition captures the capacity for generalised problem-solving that distinguishes humans from other species and machines (thus far). To illustrate, humans can reuse a set of general cognitive skills to pursue wildly different tasks to those in our ancestral environment, such as space exploration and vaccine development; whereas animals and current AI are constrained to domains for which they were specifically optimised by evolution. What this definition lacks, however, is a consideration for the amount of resources required by an agent to achieve its goals. Yudkowsky [6] addresses this limitation by characterising intelligence as “efficient cross-domain optimisation”.

Using this working definition, AI can be divided into three distinct categories with varying degrees of intelligence relative to humans:

Artificial Narrow Intelligence, composed of agents with the ability to learn only a narrow range of tasks and thus, a subhuman level of intelligence. All AI systems in current use are of this kind, including those that exhibit vastly superhuman performance in domains that they have been optimised for. This is exemplified by IBM’s chess AI, Deep Blue; despite having famously defeated Gary Kasparov, the most skilled human chess player at the time, Deep Blue is considered unintelligent because it “knows a tremendous amount about an incredibly narrow area” [7] — it lacks the capacity for cross-domain optimisation attributed to intelligent agents.

Artificial General Intelligence (AGI), which consists of agents with the potential to perform any arbitrary task that a human being can, by transferring learnings across diverse domains. Consequently, such agents are thought to manifest human-level intelligence. Though the prospect of developing AGI remains a matter of debate, a number of tests have been proposed that seek to operationalise its definition. This includes the Turing Test devised by Alan Turing [8], which a machine passes if a human evaluator, after posing a series of questions to both the machine and another human, is unable to reliably distinguish between the text-based responses produced by the two participants. Given the many criticisms [9] of the Turing Test as a criterion for intelligence, researchers tend to favour a more robust definition, such as Nils Nilsson’s Employment Test [10]. According to this test, a machine that displays human-level intelligence must be able to perform economically important jobs with the same proficiency as humans employed in those jobs.

Artificial Superintelligence, defined by philosopher Nick Bostrom as “any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest” [4, p.22]. Like AGI, the existence of a superintelligent system is highly theoretical; nevertheless, the consequences of its emergence are sufficiently grave that considerable effort is being made to understand how we might create one without engendering a global catastrophe.

Evidence for Superintelligent Machines

Considering the actions that are being taken in the present to mitigate the risks brought about by a future superintelligence, it is natural for us to question why we should be inclined to believe that one can even exist. Generally, arguments in favour of this possibility point to various constraints on human minds, which are not nearly as limiting for machines. Some of the most significant factors [11] are detailed below, although this list is by no means exhaustive.

  • Speed. Owing to our physiology, human axons carry spike signals at up to 75 metres per second, whereas machines can transmit signals about four million times faster. This difference is so great that it would allow a computer to “do as much thinking in minutes or hours as a human can in years or decades” [12, p.5].
  • Size. Brain size has been identified as a contributing factor to the difference in human and animal intelligence. While the human brain was optimised by evolution to be small enough to pass through a birth canal, there is no reason why neural networks cannot be built that are as large as buildings or cities, with storage capacities and processing powers that are entire orders of magnitude beyond our own.
  • Editability. Beyond practising certain lifestyle habits and enrolling in training programmes, there is little that humans can do in the way of improving our level of intelligence. With digital minds, on the other hand, we have the ability to experimentally modify billions of parameters that affect the overall performance of a neural network. This kind of optimisation is analogous to performing precise, repeatable neurosurgeries on humans, which is simply not feasible in our case.
  • Duplicability. Biological minds can only be reproduced very slowly, with each new brain recalling none of the information acquired by its parents during their lifetimes. Conversely, a digital mind can be replicated in a matter of minutes or hours, with the guarantee that every instance contains all of the knowledge and skills learned by its predecessor.
  • Rationality. Humans are far from rational beings — we have inconsistent goals, and frequently take suboptimal actions that fail to advance them. By contrast, machines can be programmed with a much-improved sense of rationality, allowing them to achieve their goals with greater efficiency. In fact, the use of Bayesian decision networks that facilitate rational decision-making is already a well-established paradigm in AI design [2].

Given these advantages, it is hardly surprising that computers have already exceeded our competencies in various areas, such as arithmetic and gaming. With time, it is more than likely that this list will expand to include the general cognitive abilities that account for human intelligence. While AGI timelines are highly uncertain, what cannot be doubted is that humans are far from the highest attainable intellect; as summarised by Muehlhauser [2], in terms of the intelligence scale, “there is plenty of room above us.”

From Current AI to Superintelligence

Having established a reasonable basis for the existence of a smarter-than-human AI, the question then arises, how might we achieve superintelligence? There are several theories [4] that address this, including ones involving whole brain emulations and brain-computer interfaces. However, those that emphasise the power of AI with less human involvement are generally regarded as more probable (and dangerous). The transition from today’s narrow AI to AGI is expected to be driven by the same factors currently responsible for progress in the field: improved hardware, algorithms and training data. However, it is in the subsequent transition from AGI to superintelligence where experts predict that significant progress will be made in an extraordinarily short time frame. This claim is supported by the following two scenarios that rely on the duplicability and editability of AI, respectively.

A well-known advantage of digital minds that would allow an AGI to evolve into a superintelligence is the ability to be easily duplicated. While designing the first AGI will require extensive research, an entire population of arbitrarily many AGIs can be created at a fraction of the cost via this method. Because every copy would be generally intelligent, they can be expected to follow the same approach that humans do by decomposing large, complex tasks into smaller tasks that are easier to learn. This group of agents would greatly benefit from sharing new knowledge with each other (for instance, by synchronising their databases ever so often), allowing them to solve significantly harder problems than the original can; two heads are better than one. They would also avoid the sort of coordination problems that limit the efficacy of a group of humans, by surpassing us in the skills that promote effective coordination and pursuing a common goal. Therefore, a superintelligence could emerge in the form of a large group of lesser intellects, known as a “collective superintelligence” [4, p.54], rather than a single entity.

Another widely discussed possibility is that an artificial agent could iteratively improve itself until it surpasses human cognition. This is a process of recursive self-improvement, a prerequisite for which is an AI that can interpret and rewrite its own source code without any human assistance — what Yudkowsky [13] calls a seed AI. Importantly, a seed AI need not necessarily possess general intelligence, to begin with; initial progress may be achieved if its capabilities are subhuman in most domains, but sufficiently advanced in areas relevant to AI research, such as computing and mathematics. Even so, general intelligence is required for a seed AI to continuously bootstrap its cognitive performance to rapidly attain superintelligence — an event described by I. J. Good [14] as an intelligence explosion. In contrast to the previous scenario, this superintelligence would take the form of a single decision-making agency, or “singleton [4, p.78], which would likely pose an even greater threat to humanity.

Thus far, we have discussed what AGI and superintelligence are, why we should believe that they can exist and how they might be created. Forecasting their time of arrival is a more difficult task, although recent surveys indicate that we should expect AGI to appear in the next 45–120 years [15]. The fact that we may need more time than this to solve the problem of aligning AI with humanity’s best interests testifies to the urgency of early research in AI safety. In order to appreciate current approaches to this issue, it is important to understand why advanced AI may be misaligned in the first place. This will be the focus of the following section in this series.

References

  1. Soares N. Four Background Claims. Machine Intelligence Research Institute; 2015. Available from: https://intelligence.org/2015/07/24/four-background-claims/ [Accessed 27 December 2021].
  2. Muehlhauser L. Facing the Intelligence Explosion. Machine Intelligence Research Institute; 2013. Available from: https://intelligenceexplosion.com [Accessed 27 December 2021].
  3. Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 3rd Ed. New Jersey: Prentice Hall; 2010. Available from: https://cs.calvin.edu/courses/cs/344/kvlinden/resources/AIMA-3rd-edition.pdf [Accessed 27 December 2021].
  4. Bostrom N. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press; 2014. Available from: https://media.archiware.ir/LifeBits/AI-Books/Superintelligence.pdf [Accessed 27 December 2021].
  5. Legg S, Hutter M. Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines. 2007;17(4): 391–444. Available from: https://doi.org/10.1007/s11023-007-9079-x.
  6. Yudkowsky E. Efficient Cross-Domain Optimization. LessWrong; 2008. Available from: https://www.lesswrong.com/posts/yLeEPFnnB9wE7KLx2/efficient-cross-domain-optimization [Accessed 28 December 2021].
  7. McDermott D. How Intelligent is Deep Blue? New York Times; 1997. Available from: https://www.nyu.edu/gsas/dept/philo/courses/mindsandmachines/Papers/mcdermott.html [Accessed 28 December 2021].
  8. Turing AM. Computing machinery and intelligence. Mind. 1950;59(236): 433–460. Available from: https://doi.org/10.1093/mind/LIX.236.433.
  9. Hayes PJ, Ford KM. Turing Test Considered Harmful. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. Montreal: Morgan Kaufmann; 1995. p.972–997. Available from: http://ijcai.org/Proceedings/95-1/Papers/125.pdf [Accessed 28 December 2021].
  10. Nilsson NJ. Human-Level Artificial Intelligence? Be Serious! AI Magazine. 2005;26(4): 68–75. Available from: https://doi.org/10.1609/aimag.v26i4.1850.
  11. Muehlhauser L, Salamon A. Intelligence Explosion: Evidence and Import. In: Eden A, Søraker J, Moor JH, Steinhart E. (eds.) Singularity Hypotheses: A Scientific and Philosophical Assessment. Berlin: Springer; 2012. p.15–40. Available from: https://link.springer.com/content/pdf/10.1007%2F978-3-642-32560-1.pdf [Accessed 29 December 2021].
  12. Ngo R. AGI safety from first principles. AI Alignment Forum; 2020. Available from: https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ [Accessed 29 December 2021].
  13. Yudkowsky E. Levels of Organization in General Intelligence. In: Goertzel B, Pennachin C. (eds.) Artificial General Intelligence. Berlin: Springer; 2007. p.389–501. Available from: https://intelligence.org/files/LOGI.pdf [Accessed 30 December 2021].
  14. Good IJ. Speculations Concerning the First Ultraintelligent Machine. Advances in Computers. 1966;6: 31–88. Available from: https://doi.org/10.1016/S0065-2458(08)60418-0.
  15. Grace K, Salvatier J, Dafoe A, Zhang B, Evans O. When Will AI Exceed Human Performance? Evidence from AI Experts. Journal of Artificial Intelligence Research. 2018;62: 729–754. Available from: https://doi.org/10.1613/jair.1.11222.

--

--