The Road to AGI

Published in

Sage Ai

12 min readMay 2, 2023

Introduction

This blog post aims to discuss, very succinctly, the philosophies behind the various paths of AI research leading to the holy grail of Artificial General Intelligence (AGI). The capabilities within AI can be broadly divided into three stages: Artificial Narrow Intelligence (ANI), Artificial General Intelligence (AGI), and finally, Artificial Super Intelligence (ASI) [1]. Despite the incredible recent advances in LLMs (Large Language Models), such as ChatGPT, that give the illusion of causal reasoning and intelligence, we are still firmly entrenched within the realm of ANI, with ambitions to create generalizable intelligent agents. So, what does AGI entail? The definition of AGI varies, but it is loosely defined as the ability to perform any intellectual task that a human can do. At a minimum, an agent exhibiting AGI should be able to perform reasoning and planning [2].

Can deep learning accomplish this task? In particular, can deep reinforcement learning, with its ability to plan, teach an agent to possess the required level of intelligence? If so, what would a reward system look like that incentivizes an agent to learn about the world? What is hybrid AI, and do we have the tools available to achieve it today?

What is Intelligence?

While human intellectual potential serves as a benchmark against which all artificial agents are measured, human-like behavior is not equivalent to intelligence. Human behavior tends to have a high degree of variance and is prone to errors and irrationality, making it difficult to assert that all human behavior is intelligent. The Turing test, although often cited and misused, is used to determine if an agent (such as ChatGPT) has achieved human-level intelligence. Furthermore, the question of whether sentience or consciousness is a necessary condition for intelligence is still debated. This topic will not be addressed in this blog post but Dr. David Chalmers explores this topic in a related talk [3], discussing whether LLMs are sentient and what it means to be sentient.

So how do we define intelligence? As mentioned above, the ability to reason is a necessary but not sufficient condition for intelligence. Human intelligence has been roughly classified into two types: crystallized intelligence and fluid intelligence [4,6]. Crystallized intelligence refers to accumulated knowledge, whereas fluid intelligence refers to the ability to process new information, apply it, and solve new problems. I hypothesize that an agent with aspirations of AGI must attain some notion of fluid intelligence. We can further break down intelligence into more tangible qualities. According to the report [5], which is not without controversy, and [2], an intelligent agent must possess the following capabilities:

The ability to reason.
The ability to perform planning to accomplish a certain task.
Problem solving.
Abstract thinking.
Comprehension of complex ideas.
The ability to learn quickly and learn from experience.

Planning is fundamentally what separates separates humans from other animals and other ‘intelligent’ agents. In other words, decomposing a complex problem into a sequence of steps, which often requires the capacity for abstract thinking. However, doing this well requires the ability to learn from experiences through inductive reasoning, concept formation and relationship building.

Symbolic AI vs. Deep Learning (DL)

The debate between symbol-based learning using propositional (think boolean logic)/higher-order logic and connectionist learning (neural networks) dates back several decades. However, recent advances in data-driven deep learning approaches have reignited this conversation in recent years. The war of words between proponents of both approaches, Dr. Yann LeCun (DL) and Dr. Gary Marcus (Symbolic AI), has been played out publicly.

Symbolic AI manipulates symbols to formulate logic and arrive at conclusions. Symbols can be a string of characters, such as a word, that has meaning embedded within it. Symbols allow us to express and exchange ideas. Chaining together symbols to communicate in this manner is called symbolic manipulation. This involves reasoning and is hence easier to formulate and interpret. For this reason, symbolic AI is also referred to as rules-based AI or an expert system. If you are interested, please see [7] for a quick summary on propositional logic needed to build symbolic reasoning systems. This is different from data-driven approaches, such as those utilized by deep learning and traditional ML. Deep learning is dependent on lots of data. However, recent advances in zero-shot and few-shot learning have helped to mitigate this issue, at least for the specific problems that the network is designed to solve.

Recently, the debate has shifted from whether symbol learning is necessary to whether symbols can be learned. In deep learning you extract patterns from data, and for supervised deep learning learn associations between inputs and outputs. In Symbolic AI, these associations are imparted to the system through rules. Dr. LeCun argues [8] that deep learning allows models to learn reasoning by being provided examples of such real-world reasoning scenarios as training data.

“Since this is where Symbolic AI shines, Marcus recommends simply combining the two: Inserting a hard-coded symbolic manipulation module on top of a pattern-completion DL module. This is attractive since the two methods complement each other well, so it seems plausible a “hybrid” system with modules working in different ways would provide the best of both worlds. And it seems like common sense, since everyone working in DL agrees that symbolic manipulation is a necessary feature for creating human-like AI.

But the debate turns on whether symbolic manipulation needs to be built into the system, where the symbols and capacity for manipulating are designed by humans and installed as a module that manipulates discrete symbols and is consequently non-differentiable — and thus incompatible with DL. Underlying this is the assumption that neural networks can’t do symbolic manipulation — and, with it, a deeper assumption about how symbolic reasoning works in the brain.”

Dr. LeCun has been a vocal critic of autoregressive LLMs, which could explain the softening of his stance on symbol learning. He cites examples of language models such as GPT-3 and LaMDA that show some ability to engage in ‘common sense reasoning’, however, failing to do so consistently. For example, when I asked an LLM to suggest ways of implementing a hybrid AI system, the response I received was as follows:

The most common hybrid AI approach is the combination of rule-based AI and machine learning. Rule-based AI involves creating a set of rules and logic to solve a problem. ML, on the other hand, involves training a machine learning algorithm on a large dataset to learn patterns and make predictions.

Another approach towards hybrid AI is the combination of symbolic AI and connectionist AI. Symbolic AI involves representing knowledge using symbols and logic. Connectionist AI, on the other hand, involves creating neural network(s) that can learn from data.

Although they may seem like two distinct approaches to the LLM, they are essentially the same idea: a combination of a logic-based system and a machine learning/pattern learning system. This raises the question that has been repeated ad nauseam: is this intelligence or the illusion of intelligence?

However, not everyone is convinced that symbol learning needs to be a separate entity. Dr. Geoffrey Hinton argues [9] that neural networks do not need to have symbolic reasoning ingrained into them, but rather need to learn symbol manipulating behaviors [10] without an explicit symbol-learning module. And there seems to be some merit to this opinion. In [2], the improvements in reasoning and abstract thinking of GPT-4 over ChatGPT are pretty remarkable

Fig 1. Reasoning and abstract thinking capabilities of ChatGPT vs. GPT-4 [2]

Fundamentally, the two sides seem to be at an impasse as to whether symbol learning can be learned using connectionist architectures. Even the CEO of OpenAI, which gave us ChatGPT and GPT-4, Sam Altman claims that we are at the point of diminishing returns with large models, and that we can’t scale our way to AGI [11].

How does Intelligence arise?

While the debate about structure and representation in learning continues, another question has garnered renewed interest: how can an intelligent agent learn from experience? This question lies at the heart of (Deep) Reinforcement Learning (RL), in which an RL agent learns the appropriate course of action to take by interacting with its environment. Each action results in a reward from the environment and the agent moving to a new state. A trajectory is defined as the sequence of states an agent traverses from start to finish. Critically, the best action at any state is the one that maximizes the return over the course of a trajectory, not the one that provides the maximum reward in that state.

Language models (LLMs), like ChatGPT, were trained using RL aided by human feedback, to help dictate the selection of tokens that make up language. The popularity of ChatGPT has led to the development of new tools such as LangChain [18], which allow us to incorporate disparate sources of knowledge to determine the ideal action given a particular state. This is not very different from how humans act. When faced with a situation where we lack sufficient knowledge, we turn to a body of knowledge to guide our decision-making. Projects such as BabyAGI [19] attempt to tackle the planning problem, which is already a step in the right direction towards achieving artificial general intelligence (AGI).

Rewards play a crucial role in how well an agent learns from experience. In fact, an entire sub-field of reward engineering is dedicated to learning how to design appropriate rewards that teach an agent the desired behavior. In fact, one could argue that RLHF (Reinforcement Learning with Human Feedback) is an extreme case of reward engineering where the rewards themselves are learned from human feedback during the training process. This leads us to a recent (controversial) paper on this topic, “Reward is Enough,” by Silver et al [12].

Reward is enough

In the paper “Reward is Enough” [12], the authors suggest that general algorithms, rather than problem-specific algorithms, should be formulated. These general algorithms should rely on prior expert knowledge, and all experiences and their rewards encountered along the way will result in acquired intelligence that allows one to reach various goals. This is vague enough to be indisputable but also misunderstood. As Simon Oullette [13] points out, this paper is less about the evolution of human intelligence and more about the proposed direction for future AI research.

To break down this paper, the core essence of this hypothesis is that given a particular nebulous task such as ‘get wealthy’ an agent starting from an initial state will transition through various states by making a series of decisions at each state. Initially, these decisions are likely to be less optimal but through trial and error one learns the optimal action and the learned experiences result in inductive reasoning from cause/effect and concept formation. Over time, given a new state, the agent uses the body of knowledge it has learned so that it can perform a type of ‘transfer learning’ to come up with an optimal action without having to sample and traverse many ‘trajectories’.

Criticism of ‘Reward is enough’

The idea of tabula rasa, or a blank slate, has received significant criticism regarding its relation to the development and evolution of human intelligence. Humans and animals are born with innate abilities necessary for survival, and are not entirely a blank slate shaped by experiences. This is a valid argument, and directly challenges the “no prior knowledge necessary” hypothesis. Another flaw with the reward hypothesis is that learning through pure exploratory sampling is not very efficient. Even with the use of efficient epsilon-greedy variants, the vastness of the space that an agent must traverse and learn through trial and error makes this problem intractable. Although reward alone could potentially lead to intelligence given infinite time and resources, it is rarely ever a pragmatic solution.

In “The Archimedean Trap: Why traditional reinforcement learning will probably not yield AGI” [14] Dr. Samuel Allen Alexander argues that scalar rewards based on real numbers are insufficient to attain AGI because most real-world scenarios follow a non-Archimedean reward structure. In other words, we cannot quantify the value of a particular state in non-infinite terms. Rewards need to be infinitesimally small (to avoid a state) or large (to attain a state) to represent the relative values of states. For example, during the course of medical treatment, the state of death must be avoided at all costs. How does one quantify this with a real-numbered reward? One could hypothesize that a finitely large value could suffice in a finite horizon RL problem, but Alexander is not convinced. He does concede that non-traditional RL techniques, such as preference-based reward systems, could potentially sidestep this deficit. Additionally, recent work in RLHF (Reinforcement Learning with Human Feedback) and the success of ChatGPT have solved such issues with reward engineering, training a model to learn a reward from relative preferences provided as feedback.

In “Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)” [15] Vamplew et al. argue that a scalar reward is insufficient for learning all forms of intelligence necessary to achieve a goal. They also address a concern raised by others: while this approach could lead to the evolution of intelligence, it may still be undesirable as the sole criterion because it lacks safeguards to limit unethical behavior. Just because you can do something does not mean you should. In maximizing an objective like “get wealthy,” how can a scalar reward account for ethical considerations and de-incentivize immoral actions?

A hybrid approach?

Is a hybrid approach really the way forward towards achieving true AGI? There have been various advances in fusing together the techniques of symbolic learning and deep learning. Any such hybrid system should have reasoning abilities as well as the ability to assess similarity. Dr. Gary Marcus believes that combining neurosymbolic learning with the exploratory-exploitative nature of deep reinforcement learning could get us there [16].

This could potentially address the fundamental challenges of reasoning and transferable learning. The rigidity of the symbolic approach has been criticized as has been the inability to reason in deep learning. Symbolic systems suffer from an inability to deal with heuristic and fuzzy relationships, while deep learning excels at this. On the other hand, deep learning suffers from explainability and the lack of reasoning capabilities.

However, this debate is moot according to Dr. Marvin Minsky [17], who believed that AI should not be constrained by a particular type of system and true intelligence stems from a diverse set of components. It has been postulated that even the human brain consists of a series of specialized agencies that perform a set of functions really well. They do not necessarily need to cooperate to solve new challenges, but they do need to exploit each other’s expertise. For further reading on the topic of symbolic vs. connectionist approaches, you can refer to [17].

One could argue that many industrial applications, particularly those with regulatory standards, already utilize a hybrid AI approach in principle, where business rules are combined with learned models.

What could such a hybrid system look like?

The simplest form of neurosymbolic AI is based on propositional logic or zeroth order logic using logical connective operators. Propositional logic is also more commonly referred to as boolean logic. First-order logic extends propositional logic by allowing us to establish relationships between objects.

Scallop [20] is a framework that attempts to bridge the gap between logical/symbolic reasoning and deep learning. What distinguishes Scallop from prior symbolic reasoning packages is its focus on approximate solutions for efficiency, rather than exact probabilistic reasoning. A python binding is available so that Scallop can be imported as a module. Let’s look at an example of Scallop performing relation extraction from a passage. Concepts are encoded as rules as shown below.

# fact - Jane is mother to John
rel mother = {(”Jane”, “John”)}

# fact - Jonah is father to Jane
rel father = {(”Jonah”, “Jane”)}

# This is a rule
rel grandfather(a,b) = father(a,c) and mother(c, b)

rel person = {”Jane”, “John”, “Jonah”}
rel has_no_children(a) = person(a) and ~father(a, _) and ~mother(a, _)

Fig. 3 Inferring relationships from text [20]

The possibilities are virtually endless with such a system, but is this the right path to general intelligence? We have come quite far with deep learning and specifically Generative AI, specifically in vision and language. But do we need a paradigm shift to get to AGI? Let me know what you think…

References

Rise of the machines: artificial intelligence and the clinical laboratory
Sparks of Artificial General Intelligence: Early experiments with GPT-4
David Chalmers “Are Large Language Models Sentient?”
https://en.wikipedia.org/wiki/Fluid_and_crystallized_intelligence
GottfredsonL. “Mainstream science on intelligence: an editorial with 52 signatories, history, and bibliography”
Theory of fluid and crystallized intelligence: A critical experiment
Notes on Propostitional Logic
https://www.noemamag.com/what-ai-can-tell-us-about-intelligence/
AI pioneer Geoff Hinton: “Deep learning is going to be able to do everything”
Symbolic Behaviour in Artificial Intelligence
https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/
Reward is enough
https://www.linkedin.com/pulse/reward-enough-efficient-simon-ouellette/
The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI
Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)
https://www.noemamag.com/deep-learning-alone-isnt-getting-us-to-human-like-ai/
https://web.media.mit.edu/~minsky/papers/SymbolicVs.Connectionist.html
https://github.com/hwchase17/langchain
https://github.com/yoheinakajima/babyagi
https://drive.google.com/file/d/17En24U05P9FG4V9LmJ4tMqrVNzHh6atx/view