The Hunt for AGI and the Next Big Leap in AI
The Nobel prizes for physics and chemistry are etched in the annals of history, representing a long-standing tradition of recognizing contributions to the fundamental sciences. Who would have thought that they would both be awarded in the same year to AI researchers. This unprecedented event in 2024 is an appropriate recognition of the immense impact of AI on the sciences and technology across all verticals of the global economy. AI winters are ancient history, and not likely to come back in the foreseeable future. The opportunities ahead are just too great, and AGI will certainly be the biggest prize of them all. This article explores the next big leap in AI and the road towards AGI. It concludes with predictions for the year ahead.
Digital Autonomous Workers — A paradigm shift
DAWs are AI agents that independently process data within their application environment, make decisions and take appropriate actions. DAWs may leverage a combination of foundation models, computer vision and language processing. DAWs are goal oriented, context aware, and adaptable. Perhaps the most advanced DAWs are those operating driverless cars. In 2024, Waymo* launched a multimodal model for autonomous driving that utilized Gemini’s knowledge to learn complex road scenarios and its chain-of-thought reasoning to enhance decision-making.
Across other verticals, AI agents are performing a variety of workflows including customer service, sales and marketing, data entry, and content creation. For example, they can respond to inquiries, process transactions and report back to human supervisors. AI agents can interact with SaaS platforms and reduce the need for human intervention. In a multi-agent environment, a team of agents can collaborate, with each tackling a specific task, all under the supervision of an orchestration agent. Lindy AI, Relevance AI and Spell.so are examples of startups providing no code agents where users can create and/or customize autonomous agents for a variety of tasks. Regrello* developed an end-to-end platform that provides the flexibility of creating agents for any enterprise workflow application for manufacturing and supply chain operations.
The healthcare sector may be the biggest beneficiary of DAWs. Almost one third of all data generated globally is from healthcare and yet the vast majority of this data (estimates as high as 97% [1]) is untapped by AI. AI agents will advance robotic surgery, analyze patient data and digital scans, integrate with wearables to monitor patients and alert providers in realtime. AI agents will eventually drive most workflows in health information systems. Hippocratic AI has developed a suite of AI agents that perform tasks across a wide variety of healthcare workflows, from hospital pre-op and discharge to clinical trials and pharmacy. Hippocratic AI agents proactively reach out to patients and, in some cases, provide critical advice.
DAWs at scale will redefine the nature of work and work economy. We will witness a massive increase in productivity on a scale not seen since the introduction of the Internet. DAWs will provide tools to people and small businesses that are only available to larger corporations today. Many jobs will be displaced for sure, but history has taught us that many more will be created. Just consider the massive amount of technological change that occurred during the seven decades following the year 1950. Yet over 100 million new jobs were created during that same period [2]. This is the promise of technology, and it has held true for centuries!
Every sector of the economy will be impacted, especially the services sector. Services make up roughly 70% of the United States GDP. DAWs will capture an increasing percentage of the services market and catapult the AI economy by orders of magnitude, creating tremendous investment opportunities.
To be clear, we are still in the early stages of DAW development. As we move closer and closer to AGI, the field will continue to progress, especially in areas like multilevel complex planning, reasoning, insight and decision-making.
The hunt
Will the real AGI please stand up? AGI is perhaps one of the most misused terms today. The term was initially coined by Mark Gubrud in 1997 as: “AI systems that rival or surpass the human brain complexity and speed, that can acquire, manipulate and reason with general knowledge, and that are usable in essentially any phase of industrial or military operations where a human intelligence would otherwise be needed.” The term AGI was further popularized by a 2005 book with AGI as its title [Ben Goertzel and Cassio Pennachin].
What is needed: Ability to perform multilevel planning; social and emotional intelligence; adaptation; explainability; level 3 reasoning on the Judea Pearl scale; perception (understanding of the surrounding 3D environment); solving types of problems never seen before, language understanding, and more.
Are we close? Moravec’s paradox essentially states that what is easy for machines is difficult for humans and vice versa (referring to perception, motion/motor control). The human brain has over 100 billion neurons, with each connecting to up to 10,000 others. The human brain computes efficiently and effectively on noisy real-world unstructured data. The human mind is conscious. It loves, it hates, it feels fear, it abstracts, it reasons, and it is curious. All this while consuming around 20 watts of power, equivalent to driving a low wattage LED light bulb. No present LLM is even close to that level of complexity or efficiency.
Though LLMs are doing amazing things today, it is debatable whether they alone can get us to AGI. They will likely play an important part of the journey, but they need to greatly improve especially in areas such as reasoning and planning. In a 2024 paper, Apple researchers argue that “current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data” [3]. Moreover, LLMs cannot yet effectively perform tasks involving complex planning. Some even argue that the rate of progress in performance and quality resulting from model size scaling is beginning to plateau.
Although AGI has not yet arrived and may still be far away, there is tremendous progress underway. Let’s take a look at some of the most notable innovations in 2024.
Paving the road
A worm of inspiration: In October, Liquid AI, an MIT spinoff, launched its Liquid Foundation Model (LFM). This is a new type of foundation model built from first principles. LFMs are inspired by the brain architecture of a tiny worm, Caenorhabditis elegans. With only 302 neurons and around 8000 connections, this worm has built a reputation as a powerful tool for drug discovery because of its small size, short generation time and conservation of cellular processes — many of its molecular and cellular pathways are similar to those in humans. As such, the C. elegans nervous system has been completely mapped. Its neurons can react differently to the same inputs at different times, depending on its surrounding conditions. The neural interactions involved were modeled with a differential equation that had no known closed-form solution since 1907. The co-founders of Liquid AI developed a simple and elegant solution to this equation, inspiring a new type of artificial neural network with a new type of neural activation function that is time-dependent and more closely emulates biological mechanisms. The results were amazing: much smaller more efficient models that especially excel in processing real-world time-dependent data. It took only 19 neurons to build a model capable of driving an autonomous drone.
Unlocking the “Why”: AI that can efficiently and reliably extract causality from data will be one of the crucial vehicles on the road towards AGI, with massive applications across verticals and especially healthcare. Aitia Bio is bringing causal AI to drug discovery. Through digital twins, it is building hypotheses to answer “does A cause B” questions. Today’s LLMs are getting better at “thinking” but still fall short of System 2 thinking in Kahneman’s framework. System 2 thinking is characterized by relatively slower, more deliberate and conscious thinking for providing insight and solving complex problems. Causal AI will provide a higher degree of explainability and can mitigate spurious correlations that can lead to bias. In 2024, researchers proposed novel ideas to integrate causal models, including, for example, causal attention mechanisms, into LLMs [4, 5]. The idea is to infuse causality at various processing stages of an LLM. Causalens is grounding LLMs (i.e., adding domain-specific data) by combining historical data with domain expertise and employing causal discovery algorithms for extracting cause and effect relationships.
Spatial Intelligence: LLMs are great at what they do — language intelligence. However, the field of AI is much broader. Applications such as humanoid robots and autonomous cars have a physical presence and thus must be able to perceive the world around them, learn through interaction and make decisions. In addition to language intelligence, these applications need spatial intelligence. Spatial AI enables systems to understand the spatial relationships between objects in an environment. For example, understanding that a ball on a slanted incline will roll off. In 2024, Fei-Fei Li launched World Labs, a startup developing a new kind of foundation model, Large World Model (WLM) that can reason about the 3D world. If successful, this model will be a breakthrough for many applications from robotics to manufacturing.
Sense, learn, move, repeat: There is a well-known theory in cognitive science called Embodied Cognition. It stipulates that intelligence is deeply associated with our interaction with the physical world. The human brain learns by receiving sensory information from touch, smell, taste, sight and sound. It builds 3D models of the physical world by associating observations with locations. The observations we make are associated with the locations of the observations in a moving frame of reference. We build knowledge by sensing and moving in the physical world. This sensorimotor learning system is described in a new framework called Thousand Brains Theory of Intelligence. Numenta, a Stanford startup spinoff, is building the next generation of intelligent machines based on this theory. The goal is to produce machines that engage and adapt in the physical world without supervision.
Data Center Explosion
This is music to the ears of the great semiconductor industry, which continues to innovate and scale integrated circuit chips to meet today’s compute demands and drive tomorrow’s innovations. Due to these innovations, the cost of inference in terms of dollars per token has substantially fallen in 2024. Unfortunately, there are significant challenges to address. Sure, bigger, faster chips are great, but the energy requirements are growing to unacceptable levels. IDC projects the AI workload demands to surge at a CAGR of 40.5% through 2027, reaching 146.2 terrawatt-hours by 2027, with global datacenter electricity consumption (including AI workload) to grow to 857 terrawatt-hours in 2028 (more than doubling the 2023 consumption) [6].
While startups such as Crusoe* develop very interesting solutions on the energy supply side, including renewables, the root cause of the problem should be addressed. LLMs are becoming monstrous in size, consuming tremendous amounts of compute. Most of the compute involves multiplying very large amounts of very large matrices. The number of FLOPS required per token generated during inference is typically calculated as a multiple of the number of parameters in the model. With trillion-parameter models already here, the number of FLOPS per token is skyrocketing.
A single Nvidia B200 consumes over 1 kilowatt of power. It has 208 billion transistors and boasts a computational power of up to 20 petaflops. Training a single 1.8 trillion-parameter model on 4000 such GPUs would consume 4 megawatts, down from 15 megawatts for the earlier Hopper generation but still very significant. Future data centers are expected to contain hundreds of thousands and even millions of GPUs each. They will require gigawatts of power. One gigawatt can power up to 750,000 homes! This is a challenge that cannot go unchecked — it must be addressed. The solution will come in the future from a new genre of ML models that are much smaller and more efficient. On the compute side, we need a completely new paradigm. Lightmatter, a Bay Area startup, has developed a compute technology that leverages the speed and efficiency of light. The startup developed the world’s fastest photonic interconnect technology with exceptional energy efficiency. There remain some challenges on the systems integration level, but these engineering challenges are workable. The future of AI compute may very well be photonic!
Notable Contributions and the year Ahead
Foundation models galore: In 2024, Stanford researchers launched Evo, a foundation model for DNA. Evo was trained on the genomes of 2.7 million bacteria and 80,000 microbes using a context window of 131,000 base pairs. It captures how information flows from DNA to RNA (Central Dogma) and can project how changes in the nucleotide sequences can alter the behavior and evolution of the organism [7].
Also in 2024, the startup Covariant introduced RFM-1, an 8 billion parameter transformer model for robots was trained on physical real-world multimodal data. This allows the model to deal with complex dynamics and constraints in the physical world.
Infinite context sequences in 2025: Models will continue to grow in terms of parameters and context sequence lengths, but we will also witness the rise of smaller more efficient models, especially for edge applications. Longer context sequence lengths provide a model with a larger amount of information that is relevant to the query and can thus substantially improve the response quality. LLMs today can have over a million-token context sequence, and some are predicting infinite context sequence lengths in 2025. Is this technically possible? Not in the quantitative sense, but in 2024 Google developed Infini-attention, a powerful approach to scale to extremely long context lengths. To reduce the amount of computation (which traditionally grows with the square of the context length), Infini-attention employs the concept of compressive memory: a compressed summary of the past along with the most recent text. It integrates local and long-range attention into a single attention mechanism. This allows the model to “forget to forget” enabling the development of much more advanced systems that remain context aware as they process virtually infinite input sequences with linear compute complexity
It is going to be another great year for AI:
· Foundation models for everything — new more powerful foundation models will emerge across sectors including healthcare, retail, finance, manufacturing, legal, defense, transportation and more.
· Training on real-world physical data will accelerate, forging a new dimension in machine intelligence.
· Causal AI algorithms will be incorporated into multimodal models, improving their causal reasoning capabilities.
· Autonomous agents will begin to take over the enterprise, customer-facing and back-office routine services.
· The development of smaller more efficient models for edge applications will accelerate. These models are already approaching their much larger peers in quality.
· Though Nvidia GPUs will continue their dominance, inference accelerators such as Sambanova and Groq will steadily grow their share of AI workloads.
· Sakana AI will make significant progress towards developing an autonomous AI scientist that can do research and potentially innovate and propose real solutions to difficult challenges. While the full scale of this may be further down the line, it is fascinating that we can even contemplate it today!
It will be fun to watch!
References
[2] https://time.com/7178872/agents-unlimited-age/
[3] https://arxiv.org/pdf/2410.05229
[4] https://arxiv.org/abs/2403.09606
[5] https://openreview.net/forum?id=TgeVptDYAt
[6] IDC Research Report, September 2024
[7] https://www.science.org/stoken/author-tokens/ST-2260/full
*Note: Mubadala Capital Ventures is an investor in Regrello, Waymo and Crusoe.