Tests that Confirm Human-level AGI Has Been Achieved

SingularityNET
SingularityNET
Published in
8 min readAug 23, 2024

How would we know if we have achieved Artificial General Intelligence (AGI)?

Imagine you’ve been tasked with determining whether a machine possesses true intelligence or even sentience.

In a world where Large Language Models (LLMs) like ChatGPT, Gemini, Grok, Claude, and others can effortlessly generate human-like conversations, how would you distinguish between mere mimicry and genuine understanding?

Many people who have been using these systems have been convinced that they are interacting with something that truly “thinks.” This gives us a benchmark — a reminder of just how easy it is to be deceived by even narrow AIs with surface-level competence.

But when it comes to Artificial General Intelligence (AGI) — a machine capable of human-like reasoning across a vast array of tasks — the challenge goes far deeper than simply being convinced by a conversation.

While the precise definition or characterization of AGI is not broadly agreed upon, the term “Artificial General Intelligence” has multiple closely related meanings, referring to the capacity of an engineered system to:

  • Display the same rough sort of general intelligence as human beings;
  • Display intelligence that is not tied to a highly specific set of tasks;
  • Generalize what it has learned, including generalization to contexts qualitatively very different than those it has seen before;
  • Take a broad view, and flexibly interpret its tasks at hand in the context of the world at large and its relation thereto.

In essence, AGI is not just about creating machines that can perform specific tasks like playing chess or recognizing faces. It’s about developing systems with the versatility, adaptability, and cognitive depth to navigate the world in ways that are comparable to human beings.

However, as we edge closer to this extraordinary milestone, a critical question arises: how would we confirm that we have truly achieved human-level AGI?

This article explores six key tests that could serve as benchmarks for confirming the arrival of AGI, each designed to probe different dimensions of what it means to think, reason, and act like a human.

“The ability to answer queries regarding ingested training data, and generate new products based on the probability distribution inferred from training data, is certainly valuable and fascinating. But there are other important capabilities that LLMs and other currently commercially popular AI technologies lack, such as:

- Compassion and empathy for other beings;

- Capability for complex, surprising, multi-stage logical reasoning (as is needed to carry out groundbreaking math and science and engineer, or to navigate subtle ethical dilemmas in novel situations);

- Fundamental creativity that leaps beyond what has previously been seen and experienced in significant respects;

- The ability to act as an autonomous agent and organism, balancing individuation and self-transcendence as it goes about developing itself and exploring of the world.”

From Dr. Ben Goertzel’s Beneficial AGI Manifesto

The Turing Test: A Foundational Measure of Intelligence

The Turing Test, proposed by Alan Turing in 1950, remains one of the most iconic benchmarks in artificial intelligence. This test is designed to assess whether a machine can exhibit intelligent behavior that is indistinguishable from that of a human.

In a typical Turing Test scenario, a human evaluator engages in a text-based conversation with both a machine and a human, without knowing which is which. If the evaluator cannot consistently distinguish between the machine and the human, the machine is said to have passed the test.

While the Turing Test is a foundational measure of machine intelligence, it primarily focuses on linguistic capabilities. The ability of a machine to simulate human conversation does not necessarily equate to true understanding or consciousness.

Nevertheless, a machine that passes the Turing Test demonstrates a significant level of cognitive sophistication and represents an important step toward AGI.

So while the Turing Test can be useful to us, it’s simply not going to be sufficient. LLMs have already passed the Turing Test, successfully fooling conversational partners 54% of the time.

On to the next…

The Winograd Schema Challenge: Moving From Language to Understanding

To address some of the limitations of the Turing Test, the Winograd Schema Challenge (WSC) was introduced as a more rigorous measure of a machine’s understanding and reasoning abilities. This test involves presenting a machine with sentences containing ambiguous pronouns, where the correct interpretation requires not just linguistic processing but also common-sense reasoning and world knowledge.

For example, consider the sentence: “For an AGI to operate effectively, it must learn from a diverse range of experiences.” To correctly identify what “it” refers to, the machine needs to understand the relationship between AGI, the process of learning, and the importance of diverse experiences. Successfully navigating such challenges indicates that the machine can reason about the world in a way that goes beyond surface-level language processing.

Passing the Winograd Schema Challenge would suggest that an AGI system has achieved a deeper level of understanding and can apply general knowledge in a way that is more aligned with human cognitive processes.

Large language models have shown some capability in handling Winograd Schema-like tasks, but they have not consistently or reliably passed the Winograd Schema Challenge (WSC) as it was originally conceived. We might be on the right track here.

The Coffee Test: Practical Intelligence in the Physical World

While tests like the Turing Test and the Winograd Schema Challenge focus on cognitive and linguistic abilities, true AGI must also demonstrate competence in interacting with the physical world. The Coffee Test, proposed by Apple co-founder Steve Wozniak, is a straightforward yet profound test of an AI’s practical intelligence.

In this test, an AI-powered robot is tasked with entering an ordinary home and making a cup of coffee. To do this, the robot must locate the coffee machine, find the necessary ingredients, understand how to operate the machine, and complete the task without human intervention. This test challenges the AI to integrate various forms of knowledge — about objects, their functions, and the steps involved in a task — into coherent and purposeful action.

The Coffee Test is a powerful measure of an AI’s ability to navigate and manipulate the physical world in a human-like manner. Passing this test would indicate that the AI has developed a practical, situational intelligence that is essential for real-world applications.

The Robot College Student Test: Achieving Diverse Knowledge

A key aspect of human intelligence is the ability to learn across a wide range of subjects and apply that knowledge in different contexts. First conceptualized by Dr. Ben Goertzel, CEO of SingularityNET, The Robot College Student Test envisions an AGI system enrolling in a university, taking classes alongside human students, and successfully earning a degree.

This test would require the AI to demonstrate proficiency in various academic disciplines, from science and mathematics to humanities and the arts. The AI would need to engage in discussions, complete assignments, and pass exams, all while showing creativity, critical thinking, and the ability to synthesize knowledge across different fields.

Passing the Robot College Student Test would signify that the AGI has achieved a level of intellectual versatility comparable to that of a human, capable of learning and applying knowledge in diverse domains. While some LLMs have successfully passed exams from law and business schools, there is still a long way to go until an AI system can successfully complete the Robot College Student Test.

The Employment Test: Functioning in a Human Work Environment

One of the most practical and comprehensive tests for AGI is the Employment Test, which evaluates whether an AI can perform any job that a human can, without requiring special accommodations. This test challenges the AI to learn new jobs quickly, adapt to changing work conditions, and interact with human coworkers in a socially appropriate manner.

The Employment Test goes beyond cognitive and practical intelligence, probing the AI’s ability to navigate complex social environments, understand and follow social norms, and contribute meaningfully to a team.

Success in this test would indicate that the AGI is not only capable of performing specific tasks but can also integrate into human society as a functional and effective participant.

The Ethical Reasoning Test: Navigating Human Values and Morality

Human intelligence is not just about solving problems or completing tasks; it also involves understanding and applying ethical principles.

The Ethical Reasoning Test evaluates an AI’s ability to make decisions that align with human values, particularly in situations involving moral dilemmas.

For example, the AI might be presented with the classic trolley problem, where it must choose between actions that could harm different numbers of people. The test would assess the AI’s reasoning process, its understanding of ethical principles, and its ability to justify its decisions in a way that resonates with human moral intuitions.

Passing the Ethical Reasoning Test would demonstrate that the AGI can navigate the complex and often subjective landscape of human morality, an essential capability for any system that interacts with humans on a deep and meaningful level.

The Multifaceted Challenge of Confirming AGI

Think about it — is achieving AGI just a matter of advancing technology? Or is it about replicating the depth and breadth of human cognition in machines?

Each of the tests described above targets a different aspect of what it means to be generally intelligent — from language and reasoning to practical skills, adaptability, and ethics.

Together, these tests form a comprehensive framework for evaluating whether an engineered system has truly achieved human-level AGI.

It’s likely that no single test can achieve that, but a combination of rigorous assessments across different domains — such as language comprehension, reasoning, practical problem-solving, social interaction, and ethical decision-making — might provide a comprehensive evaluation of whether an AI has truly reached human-level intelligence.

These tests are not just about proving that machines can think — they are about ensuring that when they do, they do so in ways that are aligned with the richness, complexity, and moral fabric of human life.

About SingularityNET

SingularityNET was founded by Dr. Ben Goertzel with the mission of creating a decentralized, democratic, inclusive, and beneficial Artificial General Intelligence (AGI). An AGI is not dependent on any central entity, is open to anyone, and is not restricted to the narrow goals of a single corporation or even a single country. The SingularityNET team includes seasoned engineers, scientists, researchers, entrepreneurs, and marketers. Our core platform and AI teams are further complemented by specialized teams devoted to application areas such as finance, robotics, biomedical AI, media, arts, and entertainment.

Decentralized AI Platform | OpenCog Hyperon | Ecosystem | ASI Alliance

Stay Up to Date With the Latest SingularityNET News and Updates:

--

--

SingularityNET
SingularityNET

The world's first decentralized Artificial Intelligence (AI) network