The AI Arms Race

9 min readFeb 8, 2023

“It’s a new day in search. The race starts today… We’re going to move fast.” — Satya Nadella, Microsoft CEO

In recent years, the field of Artificial Intelligence has seen a rapid rise in the development of large language models. These models, based on deep learning techniques, have been trained on vast amounts of data and are capable of performing tasks such as text generation, question answering, and language translation with remarkable accuracy. The race to produce the “ultimate” large language model has resulted in what has become known as the “AI Wars.” From Deepmind’s Sparrow to Anthropic’s Claude to OpenAI’s ChatGPT, the quest for the most performant and capable model continues to loom. Major technology companies and research institutions are pouring billions of dollars of resources into this area, eager to claim the title of the leading provider of AI technology. In this article, we’ll take a closer look at the recent critical developments in the AI Arms Race of our time and explore what they mean for the future of artificial intelligence.

The (Chat)GPT Revolution

When ChatGPT was released in late November 2022, it took the world by storm. With a simple conversational UI, ChatGPT’s advanced language generation capabilities and ability to respond to user inputs in a human-like manner made it a popular tool for both personal and professional use. Tuned via the technique of RLHF (Reinforcement Learning Human Feedback), ChatGPT can generate coherent responses, summarize long texts, translate between languages, and perform various other NLP tasks; its impressive skills earned it widespread recognition and praise. People were amazed by its accuracy and efficiency, leading to its rapid adoption and integration into various applications and services. Yet, Yan LeCun, an early pioneer of the Deep Learning revolution up with the likes of Youshia Bengio and Geoffrey Hinton, was not completely sold: “In terms of underlying techniques, ChatGPT is not particularly innovative.” He continues:

“It’s nothing revolutionary, although that’s the way it’s perceived in the public,” said LeCun. “It’s just that, you know, it’s well put together, it’s nicely done.”

The key insight in LeCun’s words is that ChatGPT is simply the result of multiple pieces of ML technology that have been developed over the last few years coming together to produce a feat of engineering, not true innovation: “You have to realize, ChatGPT uses Transformer architectures that are pre-trained in this self-supervised manner,” observed LeCun. “Self-supervised learning is something I’ve been advocating for a long time, even before OpenAI existed”. Regardless of whether ChatGPT is truly innovative, OpenAI did succeed at one major goal: bringing large language models into the mainstream. In metropolitan cities, corporate workplaces, and educational institutions, ChatGPT and the GPT-N suite of models have become the closest thing to a household name in the field of AI. Industry leaders found it hard to ignore the possibilities of the generative AI landscape. Moreover, with the growing ecosystem of LLM tools, developers have been able to produce adjacent software solutions — platforms like LangChain have accelerated the development of prompt-based products alongside semantic search and clustering applications. Yet despite the massive commercial interest and market it has ignited, ChatGPT is only a single step into what the field of large-scale language models promises. Particularly, several challenges still remain:

Models hallucinate and lack the ability to fact-check
LLMs cannot account for real-time developments and fail to account for up-to-date information
Model retrieval of context from the web is not completely accurate or guaranteed to be the most optimal source
Current state-of-the-art models require human-in-the-loop feedback to optimally refine responses
How can we make models more interpretable and compositional?

Broadly, the vast majority of today’s leading models follow a similar paradigm — they are auto-regressive, self-supervised, and dense transformer-based models. To solve the aforementioned shortcomings of modern LLMs, one must think about the bigger picture at stake: will advancing the models require incremental improvements or a fundamental reshaping of the underlying transformer architecture and AI methodologies? Part of the beauty of this impending AI arms race is that it will create a competitive “Cold War” style spirit of innovation. If the JFK presidential administration could put a man on the moon, certainly today’s AI research institutions and technology conglomerates can succeed at making models interpretable, externally validated, and truthful.

The Model Race

With advances in parallel computing & hardware alongside the latest variants of the transformer architecture, several research labs and companies have begun to produce their own large language models. However, there is a significant data and cost moat when it comes to training an LLM from scratch: models must pre-train on millions if not billions of tokens to produce meaningful results. This is only even possible as a result of the large context window and the ability of the transformer architecture to capture long-term dependencies. Key differences in various approaches to LLM creation (the process of model development, training, and deployment) manifest in three key layers: model scaling, data integrations, and user interface.

Model Scaling and Compute

The model scaling layer refers to increasing the number of model parameters at scale in hopes of capturing more complex data representation and thus, producing a more advantageous model for inference & prompt completion. With this vision of model optimization across multi-millions of parameters, LLMs like GLaM, LaMDA, Gopher, and Megatron-Turing have achieved state-on-the-art few-shot results on many tasks. However, Google’s latest Pathways Language Model (PaLM) serves as the best example of the latest efforts in model scaling. Being a 540-billion parameter, dense decoder-only Transformer model, PaLM achieves top-notch performance via the Pathways system: a sharded dataflow graph to accelerate parallel computation and provide a new way to express complex parallelism patterns. The Pathways system in particular enables an all-time high in training efficiency — specifically via the computation of the attention (context) and feedforward layers in parallel. With its massive scale in training and model size, PaLM shows new advancements in reasoning, particularly as demonstrated by chain-of-thought prompting:

Standard prompting versus chain-of-thought prompting for an example grade-school math problem. Chain-of-thought prompting decomposes the prompt for a multi-step reasoning problem into intermediate steps (highlighted in yellow), similar to how a person would approach it. (Source: Google)

Data and Human Feedback Integration

From code generation to explanations, PaLM is a relative expert when it comes to problem-solving. While PaLM is meant to be a more-general purpose model, essentially Google’s analog to the line of GPT-N models, the LaMDA model, also by Google, is more geared towards conversation and dialogue-based search. The second horizontal vertical in regard to LLM differentiation — specific data integrations — powers LaMDA’s conversational responses. What does it mean by specific data integrations? This refers to the various forms of data and human feedback used to pre-train an LLM. OpenAI’s series of InstructGPT models are a great example of models primed for educational and explanatory outputs, rather than typical conversational responses or vanilla problem-solving. The InstructGPT models are trained with humans in the loop and are essentially aligned with the philosophy of enabling LLMs to better follow instructions. As a byproduct, these models also gravitate towards maintaining more truthfulness and less toxicity. The bigger principle here in regard to data integrations is that of “finetuning on a small curated dataset of human demonstration.” ChatGPT is simply an extension of the InstructGPT models with further specializations for dialogue and conversation. As OpenAI suggests, “the dialogue format makes it possible for ChatGPT to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.” While ChatGPT and InstructGPT are OpenAI’s sibling models for unique human-in-the-loop data integration, Deepmind’s Sparrow is a similar attempt at integrating human data to improve LLM interoperability. As a variation on RLHF and being inspired by past work like SpaLM and GopherCite, Sparrow utilizes a process of adversarial probing, enabling its dialogue to be more useful and safer — or as Deepmind puts it, “aligned with human values for effective and beneficial communication”.

RLHF demonstrated for ChatGPT (Source: OpenAI)

User Interface and Model-User Interactions

Beyond model scaling and human feedback integrations, LLMs continue to differentiate in regard to deployment and user interface. Part of what made ChatGPT a near-overnight phenomenon was its simple chat-bot style search interface — no unnecessary tooling or features, just an isolated model with a record of previous conversations. Similarly, companies like Anthropic and Perplexity.AI have begun to tinker with unique forms of deployment. Claude, Anthropic’s rival to ChatGPT, began early user testing through a unique medium: a Slack bot. Similarly, Perplexity.AI is one of the first companies to provide its product via an in-house chrome extension. Large-scale generative vision models, particularly those powered by Stable Diffusion, have also seen unique forms of deployment: Midjourney, a competitor to DALL-E, has seen widespread success through Discord. While the classic medium of interaction beyond user and model is a search bar & conversation web interface as seen with ChatGPT and Meta’s Galactica (now not publicly available), it is worthwhile to explore more nontrivial deployments of models and methods to accentuate model-user interaction. Perhaps, down the line, models will be able to predict future queries before they are asked, unify multi-modal outputs (combining language and vision) in a single interface, and be instantly accessible from any corner of the internet. Going beyond seamless UI experiences, OpenAI CEO, Sam Altman, provides an optimistic view of the innovation to come:

“The stuff that I’m excited about for these models is that it’s not like, ‘Oh, how do you replace the experience of going on the web and typing in a search query but, ‘What do we do that is totally different and way cooler?”

Google v. Microsoft

When Microsoft announced its 49% stake in OpenAI with a $10B commitment in funding, excitement began to rise in the AI community. With ChatGPT and the later-released ChatGPT+, LLMs gained a consumer-centric lens, but through a partnership with Microsoft, the GPT models could now be commercially deployed in the real world and for a widespread suite of applications. While rumors proliferated for a month since the initial announcement, on February 7th, 2023, Microsoft fired the opening salvo of Big Tech’s generative AI arms race: joining forces with OpenAI to revamp Bing with the Edge Browser and integration with a model some claim to be even stronger than ChatGPT. Satya Nadella, himself, claims:

“I think that this technology is going to reshape pretty much every software category.”

Nadella also compared it to past pillars of technology like search, mobile, and cloud, but said that even more than those paradigm shifts, AI is the closest the sector has come to a “Mosaic moment,” which introduced the web browser to the world. At their launch keynote, Microsoft identified similar leaders like Jasper, Stability AI, Chatsonic, and Anthropic — one key name was missing: Google. While Microsoft and OpenAI continued to solidify their partnership in early 2023, alarm bells rang at Google. For the first time in decades, founders Sergey Brin and Larry Page made a code request, signaling the internal Code Red. However, Google wasn’t short on action or headlines either: they debuted their own ChatGPT rival, Bard. Similar to Microsoft, Google also made its own investment in a major AI research lab — pouring $400M into Anthropic to battle the power of ChatGPT. With these alliances, the arms race continues to heat up. With its research subsidiary Deepmind and Google Brain divisions, one would naturally expect Google to gain steam in this ongoing AI arms race. Despite the stern competition, OpenAI has the intellectual youthfulness and underdog mentality of a startup, now paired with the scale and infrastructure of a trillion-dollar partner in Microsoft. The competition between the two alliances essentially boils down to the same set of factors that were earlier discussed: model scaling, integrating with different data and human feedback, and UI optimization. At the 10,000-foot view, it’s a competition of who can find the transformative architectural leaps compared to incremental updates — even if it comes down to who can multiply matrices faster or drastically lower compute …

Final Thoughts

The AI revolution is here — with a competitive spirit and thirst for innovation on par with that of space exploration in the Cold War era. Spearheaded by big players like Microsoft and Google and grounded by smaller research labs like OpenAI and Anthropic, the AI arms race is bound to produce an exciting future for every layer of software: from developer tools to conversational assistants to human productivity. We may even begin to see some emergent properties as models like GPT-4 are released. Perhaps elements of machine consciousness also begin to manifest, producing a major step towards AGI.

The AI Arms Race

The (Chat)GPT Revolution

The Model Race

Google v. Microsoft

Final Thoughts

Written by Siddharth Sharma