Jacob Grow
4 min readMar 8, 2024

--

Boundaries of Large Language Models and Path Forward for AI

Recently artificial intelligence and large language models (LLM) like ChatGPT have stirred both awe and scrutiny. While these AI constructs can mimic human-like text generation, prominent figures in AI research, including Yann LeCun, Meta’s Chief AI Scientist, shed light on their inherent limitations, guiding us towards the next frontier in AI innovation.

The Promise and Pitfalls of Current AI and LLM

LLMs have revolutionized natural language processing, offering unprecedented capabilities in generating coherent, contextually appropriate text. These models have shortcomings, particularly in areas crucial for genuine intelligence: reasoning, memory, understanding of the physical world, and planning.

Reasoning and Planning

Yann LeCun highlighted the significant gap between the capabilities of large language models (LLMs) and the complex reasoning and planning seen in humans and animals. According to LeCun, current LLMs, like ChatGPT, lack the ability to perform these functions at a human-like level, primarily because they operate by identifying patterns in text rather than truly understanding the content (https://news.northeastern.edu/2023/05/26/large-language-models-ai-godfather/).

Researchers found that models like the architecture that contains the transformer LLM’s are built on like BERT (Bidirectional Encoder Representations from Transformers) struggle with examples from different distributions, revealing a reliance on statistical features rather than true reasoning capabilities. The study introduces “SimpleLogic” problems to strictly test reasoning abilities, emphasizing the challenge of teaching models to reason when they tend to learn from statistical patterns inherent in data.

Memory and Understanding

Another critical limitation of LLMs is their struggle with both short-term and long-term memory, which severely restricts their ability to maintain context over long interactions or to build upon previous knowledge. This, coupled with a shallow understanding of the generated content, means that while LLMs can produce text that appears plausible, they often fail to grasp the underlying concepts or contexts (https://www.noemamag.com/ai-and-the-limits-of-language/).

The Auto-regressive Nature and Its Limitations

LLMs rely on auto-regressive mechanisms to predict subsequent words in a sentence, a process that, while impressive, falls short of demonstrating a deep comprehension of language or context. This mechanism allows LLMs to generate plausible continuations of a given text but does not equate to an actual understanding of the information being processed (https://futurist.com/2023/02/13/metas-yann-lecun-thoughts-large-language-models-llms/).

LLMs are autoregressive, because they predicts future values based on past values. For example, an autoregressive model might seek to predict a stock’s future prices based on its past performance. They can prove inaccurate under changing contextual or environmental conditions, such as new concepts.

Towards a Future of More Intelligent AI

LeCun envisions the future development of AI to include architectures that can understand abstract representations of the world, moving beyond the capabilities of current language models. This approach aims to create systems capable of making predictions and planning based on a more profound understanding of the real world (https://the-decoder.com/why-large-ai-language-models-dont-lead-to-human-like-ai/).

Shifting Focus from Language to World Models

LeCun and his colleagues argue that the path to human-like AI lies beyond refining language models. They suggest that a significant portion of human and animal knowledge is non-linguistic and cannot be fully captured by language models, no matter the extent of training. This perspective highlights the necessity for innovations that can learn and interpret the complexities of the real world, rather than just processing text (https://the-decoder.com/why-large-ai-language-models-dont-lead-to-human-like-ai/).

He proposes a modular cognitive architecture aimed at overcoming these models’ limitations, allowing machines to learn about the world, reason, and plan actions that optimize a set of objectives. Central to this architecture is a predictive world model, which employs a Hierarchical Joint Embedding Predictive Architecture (H-JEPA) trained through self-supervised learning. This model is designed to learn abstract representations that are both informative and predictable, suggesting a path towards more autonomous machine intelligence​​.

He elucidates this in his AAAI (Association for the Advancement of Artificial Intelligence https://aaai.org/) 2024 keynote presentation. https://drive.google.com/file/d/1pLci-z_Q-e4Scf3CrrJRitqLliOqye8g/view?fbclid=IwAR0ArAO-xpP40av8ph_4JcAc1UXa1t3bXwlA226XWglhJHGefH7Zn26eO4k and in his paper https://openreview.net/pdf?id=BZ5a1r-kVsf

Summary

While large language models like ChatGPT have marked a significant milestone in AI development, their limitations underscore the need for continued innovation in the field. Insights from AI researchers, like Yann LeCun, emphasize the importance of developing new models and architectures that more closely mirror human cognitive processes and our world models. We may stand on the cusp of significant AI evolution, but the quest for machines that can truly understand and interact with the world around them remains a compelling frontier in the journey towards artificial general intelligence.

--

--

Jacob Grow

Writer who writes about national and international politics, poetry, philosophy, and technology.