OpenAI o3: a step closer to AGI
A major breakthrough in the field of artificial intelligence: OpenAI, with its brand-new model “o3,” announced on December 20, 2024, has pushed known boundaries by delivering a spectacular performance on the ARC-AGI test, one of the most demanding benchmarks for assessing the adaptive capabilities of AI. While this may seem trivial to non-experts, it is a signal of a new era where anything — and even more — is possible. Let me guide you through a detailed breakdown of the subject and the surrounding concepts!
From ChatGPT to o3: the path to intelligence
Since the release of ChatGPT in November 2022, the capabilities of generative artificial intelligence models have made astonishing progress. ChatGPT, the first to popularize conversational AIs, paved the way for an unprecedented technological revolution. In the months that followed, the number of large language models (LLMs) surged, transforming the technological landscape at a dizzying pace. In just two years, these models found applications in diverse fields such as healthcare, education, and e-commerce.
By the end of 2024, it is estimated that more than 200 active LLMs exist worldwide, developed by major players like OpenAI, Google DeepMind, and Anthropic, as well as through open-source initiatives. These models, while varied in architecture and use, share a common goal: to push the limits of what artificial intelligence can achieve. If we include smaller models available on platforms like Hugging Face, over 1.5 million variants have been cataloged, reflecting global enthusiasm for this technology.
This exponential growth illustrates the incredible potential of LLMs to address complex problems and adapt to diverse environments. For instance, in the medical field, these models generate automated summaries of patient records or assist professionals in diagnostics. In education, they provide scalable, personalized learning solutions.
However, this expansion raises critical questions. Ethical, environmental, and societal challenges abound: what is the energy cost of these technologies? Who controls access to them? And, most importantly, how can they be ensured to benefit the largest number of people? While we navigate these questions, one thing is clear: the journey from ChatGPT to o3 signals a remarkable leap forward, but it is only the beginning of an even deeper transformation.
ARC-AGI : un benchmark pour l’intelligence adaptative
To understand the challenges faced by current AI models, one must examine the importance of benchmarks, particularly ARC-AGI. These tests play a central role in evaluating model capabilities, measuring their aptitude for adapting to new situations. For non-specialists, a benchmark is akin to a standardized test for students, designed to compare performances. Just as a school test reveals strengths and weaknesses, ARC-AGI presents unique challenges to AIs to test their ability to solve unfamiliar problems. In this sense, it is an essential step toward creating artificial intelligence capable of rivaling human capabilities.
ARC-AGI, recognized as the benchmark in the field, tests AI systems’ ability to adapt to novel tasks. Unlike more traditional benchmarks, it presents problems that are complex yet intuitive for humans, such as identifying hidden patterns or completing logical sequences. These problems require AI to demonstrate abstract and adaptive understanding, beyond simple recognition of learned patterns. This level of demand measures AI’s adaptive intelligence, a critical quality for advancing toward general intelligence.
The ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) test evaluates an AI’s capacity to generalize and solve unfamiliar problems. Unlike saturated benchmarks, it focuses on tasks challenging for machines but trivial for humans. Since 2020, GPT models’ progress on this test was limited, with an initial score of 0% for GPT-3 and only 5% for GPT-4o in 2024.
A typical ARC-AGI task might involve identifying missing visual patterns in a grid or uncovering complex logical relationships from examples. For humans, these problems rely on natural intuition and rapid generalization. For AI, this demands a complex combination of abstract reasoning, data input analysis, and generating tailored responses. These skills, crucial for applications like autonomous driving or robotics, underscore why benchmarks like ARC-AGI are fundamental to AI’s evolution.
The o3 model has changed the game, achieving a score of 75.7% on the semi-private set with a high-computation configuration (aligned with the ARC Prize’s $10,000 cost limit) and an impressive 87.5% with more intensive computational resources. Experts consider this a turning point in the progression toward generalizable AI systems.
For purists, a caveat is necessary to temper this excitement. While ARC-AGI has successfully become a key benchmark, it is important to note that it is not universally accepted as a measure of general intelligence. Other benchmarks like SuperGLUE or MMLU (Massive Multitask Language Understanding), which test various capabilities, are also used. SuperGLUE includes tasks such as coreference resolution and multiple-choice questioning. A detailed description and resources are available on its official site. MMLU evaluates models on 57 academic subjects, ranging from mathematics to philosophy, law, and medicine.
A qualitative leap in AI models
This result illustrates a spectacular qualitative leap in artificial intelligence capabilities. For the first time, a model like o3 has demonstrated the ability to adapt to tasks it had never encountered before, combining pre-existing knowledge to create new solutions. By contrast, previous models, such as GPT-4o, struggled to reach 5% on the ARC-AGI test in 2024, highlighting their limitations in generalizing beyond training data. This breakthrough distinguishes o3 as a significant step toward more flexible and general AI.
o3’s ability relies on an innovative strategy called active and adaptive search. Inspired by techniques like AlphaZero’s Monte Carlo search, this method enables AI to explore a vast space of potential solutions, refining its responses over time. Imagine a chess player planning moves several turns ahead, based on millions of possible scenarios — this is how o3 operates within its natural language programming space.
A key concept here is Chains of Thought. In this model, AI breaks down each problem into a series of logical steps. Each step is tested, adjusted, and combined to produce a coherent final response. This approach gives o3 a unique ability to reason adaptively and generate dynamic solutions, even when previous data alone is insufficient.
This is more than a mere technical advance. This ability to adapt transforms how AI can be applied to real-world problems. Take healthcare as an example: a model like o3 could generate personalized treatment protocols for rare diseases by analyzing patient data. In robotics, the same capability could allow systems to navigate unknown environments, improvising solutions in real-time.
By paving the way for dynamic, generalized intelligence, o3 does not merely meet benchmarks. It demonstrates that achieving a level of intelligence once considered exclusively human is possible, laying the foundation for applications far beyond current limits.
Costs and efficiency: a critical lens
Beyond o3’s impressive technical performance, it is crucial to evaluate the broader impact of these advances on society and the planet. While models like o3 embody the extraordinary potential of emerging technologies, they cannot be divorced from their energy and social costs, raising pressing questions about sustainability and accessibility.
The intensive computational configurations required by o3 consume massive amounts of energy, often sourced from non-renewable resources. Each task performed by the model contributes to a carbon footprint that, when scaled globally, could conflict with ambitious climate goals set by governments and businesses. This reality calls for a proactive approach, including investment in more sustainable infrastructures powered by renewable energy and the development of AI architectures that are resource-efficient.
But it doesn’t stop there. The production of necessary hardware, such as high-performance processors, entails collateral impacts, including mining activities and electronic waste management. These aspects demand reflection on the complete lifecycle of AI technologies, beyond their operational phase.
Socially, the development of models like o3 raises questions of equity. Who truly benefits from this technology? Current costs render its use inaccessible to the majority of developing countries, exacerbating existing technological disparities. This digital divide could evolve into an economic chasm, where only the wealthiest nations and companies can fully exploit the benefits of advanced AI.
Moreover, the growing automation of tasks traditionally performed by humans raises employment concerns. While models like o3 solve unprecedentedly complex problems, they also risk redefining required skills in the labor market. More standardized jobs could disappear, creating urgency for massive investments in reskilling and developing competencies suited to this new technological reality.
These challenges should not be seen as obstacles but as opportunities to design AI that is truly inclusive. By integrating principles of corporate social responsibility (CSR), we can imagine a future where AI benefits are shared equitably without compromising environmental balance. This requires public policies promoting inclusive access to AI technologies and international collaboration to ensure equitable dissemination of technological advances. Moreover, implementing an ecological tax on energy-intensive models could fund sustainable solutions. There, I’ve said it!
Toward ARC-AGI-2 and beyond
The next step, eagerly anticipated, is the launch of ARC-AGI-2 in 2025. Designed to be even more demanding, this benchmark will redefine industry standards. It will identify the current limits of AI models while highlighting areas requiring new approaches or paradigms. Its role goes beyond technical diagnostics: it will serve as a catalyst, steering research toward truly adaptive solutions capable of generalizing skills.
ARC-AGI-2 also presents a unique opportunity to foster international collaboration. Bringing researchers together to tackle unresolved problems could accelerate technological development and create a shared vision of AGI-related challenges. This type of benchmark embodies the idea that no nation or organization can progress alone in such a transformative domain.
In parallel, practical applications of these advances remain central. How can an AI capable of such adaptability directly benefit healthcare, education, or the environment? Imagine an adaptive AI revolutionizing medicine by personalizing treatments based on each patient’s unique needs or providing real-time responses in unpredictable situations like natural disasters. Although this potential has yet to materialize fully, it is already redefining our ambitions.
General artificial intelligence, or AGI, could indeed arrive by 2025. It is now a possibility, not just fiction.
The next step will be the launch of the ARC-AGI-2 benchmark in 2025, which promises to be even more challenging and to redefine industry standards. Its potential impact is twofold: on the one hand, it will help identify the current limitations of AI models, highlighting areas that require new approaches; on the other hand, it will serve as a catalyst for steering research toward more general and adaptive solutions. ARC-AGI-2 could also stimulate international collaboration by bringing researchers together to tackle unresolved problems, thereby strengthening collective progress toward AGI.
At the same time, the question of practical applications remains central. How will these advances benefit sectors such as healthcare, education, or the environment? For example, an AI capable of dynamic adaptation could revolutionize medicine by personalizing treatments or intervening in real-time in unpredictable environments. To stay updated on these developments, visit the official resources on the ARC Prize 2025 website: ARC Prize 2025.
Conclusion
OpenAI’s development of o3 marks a decisive advance, not only technologically but also in the quest to rethink what intelligence truly means. This model, with its ability to tackle novel problems and create dynamic solutions, does not just expand the realm of possibilities. It redefines humanity’s ambitions for adaptive technology.
Yet o3 is not an endpoint. It is a milestone, a stepping stone in a journey still laden with questions. The model’s current limitations — its reliance on explicit frameworks, high energy demands, and inability to match human cognitive fluidity — remind us that each breakthrough is an invitation to reflection. Reflection on what we aim to achieve and the consequences we are willing to accept.
The story of o3 extends beyond its technical features. It symbolizes a turning point where technologies are no longer mere tools but partners in solving global challenges. From personalized medical treatments to solutions for environmental sustainability, these AIs offer us the opportunity to rethink systems on a large scale.
It is now up to us, collectively, to ensure these advances serve a greater purpose. Integrating ethics, guaranteeing equitable access, and promoting responsible use are necessary conditions for this transformation to be more than a technological feat — a step toward a sustainable, just, and inclusive future. Because while artificial intelligence can amplify our capabilities, it is ultimately our humanity that must guide its path.
— —
[Article created on December 21, 2024, by Jérémy Lamri with support from Claude 3.5 Sonnet, Perplexity, GPT4o, and o1 for structuring and enriching, and GPT4o for illustration. The writing and most ideas in this article are primarily mine.]
— —
Follow my news with Linktree
If you are interested inthe future of work and HR, I invite you to subscribe to the dedicated newsletter that I keep writing on the topic, and to follow the news we make on a daily basis with Tomorrow Theory: