Scaling Transformers in Textual/Visual Domains: A Gateway to AGI?

Deepak Singh
4 min readDec 3, 2023
Image generated by AI

AGI (Artificial General Intelligence) has become a hot topic of discussion lately, with leaked conversations (referencing Q* from OpenAI) adding significant momentum to the ongoing debate. Lets delve deep into several pivotal aspects of this evolving discourse.

For the purpose of our discussions, AGI is ‘The ability of a computer or other machine to perform those human activities which are thought to require intelligence.’

The concept of intelligence is inherently subjective and continuously evolving. Human perspectives on intelligence have shifted over time. Initially, when computers mastered chess and outperformed every grandmaster, we reconsidered the definition. ‘Perhaps playing chess isn’t the epitome of intelligence,’ we shifted. The next stop was to conquer a game like Go( The number of potential positions or moves in Go is astronomically high, its practically impossible to calculate or explore all of them).

Surprisingly, AlphaGo achieved a resounding victory of 4–1 against the Go world champion, Lee Sedol. Now, did that count as intelligence?

This continuous reevaluation of what constitutes intelligence underscores the complexity of the human mind and the challenge of replicating it within machines. Let’s explore the nuances of AGI and the evolving benchmarks of intelligence.

For this discussion, let’s stick to the intelligence which involves reasoning and logic like humans. Now don’t be mistaken GPT-4 did develop some reasoning capability as mentioned in the paper “Sparks of AGI”. I have written an article on the same : The Theory of Mind, has it emerged in LLMs?

There are many scientists who believe that as we scale the model larger, it will become increasingly intelligent. (There is a race underway in the world to scale it larger, implementing various tweaks and adjustments along the way). Seeing what GPT-4 achieved , scaling with more and more data (human + AI Generated) definitely looks promising.

On the other side of this, there are legendary researchers like Yan Le Cunn who believes that language has nothing to do with intelligence. He argues that Orangutan can’t speak a language but they are almost as intelligent as humans. He suggests that the way to AGI will be through improved cognitive architecture and not just by scaling the transformers on texts, images, audios etc.

Will AGI or Superintelligence destroy humanity?

When asked whether AGI would attempt to eradicate humans, his emphatic response was a resounding “NO.” He went on to emphasize that intelligence is not synonymous with a desire for power. Citing historical examples, he noted that it has never been the intelligent individuals who sought to dominate the world; rather, it has often been the reverse.

The Dual Process Theory

Let’s delve into another very interesting topic: The dual process theory(discussed in the book Thinking Fast and Slow)

Our brains use two disparate modes of thinking:

1. Intuitive Understanding (Fast)

2. Logical Reasoning (Slow)

Brains spend only about 0.001% of its cycles on reasoning proving that reasoning is a very thin layer on top of understanding. Some people argue that 20th-century research was in the wrong direction, as it was focused more on the reasoning part and less on the understanding part (more on this in some other article where I will discuss the model-free methods)

“Doing what we do without thinking has to be the first step towards True AGI” Monica Anderson

She argues that solving these complex optimization problems( for example Minimax Tree Search etc) is not what constitutes an AGI. It is the capability to have a good intuitive understanding and reasoning capability is a right step towards AGI and the model-based methods that we followed in 21st Century are not the way to do it.

Monica Anderson’s perspective sheds light on a crucial aspect often overlooked in the pursuit of AGI. She posits that the emphasis on model-based methods in the 21st century might not align with the path towards True AGI. Instead, the core lies in developing intuitive understanding. In her view, the ability to navigate complex problems intuitively rather than relying solely on explicit reasoning is the hallmark of True AGI.

This concept correlates with the Dual Process Theory, underscoring the importance of intuitive understanding alongside logical reasoning. Our brains predominantly rely on intuitive understanding, allocating only a minuscule fraction of cognitive resources to logical reasoning. This underscores the significance of developing models that prioritize intuitive comprehension, mimicking the human brain’s cognitive processes more accurately.

Yan Le Cunn’s divergence from the mainstream belief highlights a critical debate within the AI community. His emphasis on improved cognitive architecture resonates with the idea that intelligence goes beyond mere language or computational scaling. While scaling models like transformers may enhance certain capabilities, the essence of achieving AGI may lie in the development of more sophisticated cognitive frameworks rather than sheer scaling alone.

Ultimately, the quest for AGI demands an understanding of intelligence, encompassing both reasoning and intuitive understanding. It requires a departure from conventional methods and a deeper exploration into cognitive architectures that mimic the human mind’s intricate workings. As the race towards AGI progresses, reconciling these differing viewpoints will be pivotal in shaping the future trajectory of artificial intelligence.

In conclusion, the path to Artificial General Intelligence might not solely rest on scaling models or adopting predetermined methodologies. Instead, it could be a fusion of scaling, cognitive architecture, intuitive understanding, and reasoning — a synergy that reflects the multi-dimensional facets of human intelligence. Only through the integration of these diverse perspectives we can pave the way toward a more comprehensive and closer realization of AGI.

--

--

Deepak Singh

Lead R&D Scientist | NLP and Computer Vision Specialist