The path to ChatGPT since WW II

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

8 min readMay 9, 2024

While the world of computing has unfolded in front of our eyes, it is still breathtaking to see the sequence in which one technological breakthrough laid the foundation for the next. This article is an attempt to connect the dots through the lens of how the development in computing shaped the AI revolution we’re part of today.

A World War II Start

The first electronic computer emerged for military purposes in the 1940s during World War II and it occupied a space of 1800 square feet. In 1950s, IBM introduced computers meant for business purposes, called mainframes — still occupying hundreds in square footage. The potential of computers and programming was evident, but its applicability was limited to governments and businesses largely due to the size and costs of these massive machines. Scientists, however, knew that programming and computers would, overtime, automate a lot of the “well defined” and repetitive tasks. That said, scientists also knew that if a task was not well defined, meaning they lacked very specific instructions and outcomes, then programming would fall short. These tasks needed something more than what programming could do — it was the ability to learn, form a judgment, estimate or predict. Essentially there were tasks that needed systems that could replicate how our brains work — how it learns from experience, processes the inputs it receives, applies its learning and takes appropriate action. That realization led to the coining of the term “artificial intelligence” at a conference in Dartmouth. Yes, Artificial Intelligence as a field is 70 years old!

The Three Pillars of AI

For the next four decades, 60s-90s, AI in itself remained an academic field for the most part but the computing revolution of these decades was incrementally laying the foundation for its real emergence. It was laying the three pillars essential for AI to truly come to life — 1) massive amount of data, 2) massive computing power to enable this complex learning and 3) the business and customer readiness for trusting computers with their “judgment”. Now, let’s see how the construction of these pillars came during the computing revolution of those four decades.

1960s to 80s — the advent of the Personal Computer

The 60s were about the shrinking of the computer’s size — from a room size machine to the size of a refrigerator to eventually a machine that could fit on a desktop. This was enabled through the introduction of “integrated circuits” — a small chip of silicon in which all the main electronic components of the computer could fit in a miniaturized form vs. their 1000s of times bulkier versions. This shrinking of the size and subsequent cost made way for the “Personal Computing” revolution of the 70s and 80s. While size and cost made computers accessible to the average user, it was the development of the user-friendly software (thank Apple and Microsoft)” which made the computers usable by the average user, not just businesses, scientists and engineers. Excel, PowerPoint, Word and others set off the beginning of the end of the Typewriter. More importantly — they started turning physical paper into its “digitized form”. In other words, AI’s first of the three pillars was beginning to get constructed.

Digitization of the 90s

The digitization of paper documents continued well into the 90s but two factors proved to be major catalysts for the emergence of the 2nd pillar that AI needed — significantly more computing power. First was the internet which exponentially accelerated digitization by enabling the World Wide Web and eCommerce. The second was the digitization beyond the documents and text — every form of media from photos to songs to videos was turning into digital data. Among other fields, this digitization of media opened the doors to a major industry — Video Gaming. It came with a major requirement though — it needed much more computer power to process video for games than was needed to process documents or access the internet. This led to the investment in a special processor called the “Graphical Processing Unit”, which consumed heavy images and video. NVIDIA stepped up to the opportunity, but little did they know at the time that this processor, which was meant for gaming was going to set them up to be one of the shining stars of the AI revolution (they went from being a $300B company before ChatGPT launch to $2.25 Trillion in less than 18 months).

The sleeping AI giant awakens

By the 2000s, the first two pillars of AI (massive data and massive processing power) were firmly taking shape. That said, those two pillars were just technical readiness. It was the third pillar (also a result of the computing/internet revolution) which truly awakened the sleeping AI giant. It was the emergence of the true business and consumer need for the application of AI. These were new use cases which could not be handled with traditional programming despite the availability of skills and resources. For example, search engine optimization became a dire need with the proliferation of the world wide web. So did spam filtering and fraud detection. Effective product recommendations on eCommerce websites became true revenue drivers. Video game players’ demand for richer and richer experiences were growing by the day. Traditional programming simply could not meet customer expectations in these use cases. Driven by the renewed business need and empowered by the technical readiness, AI finally saw the light of the day. Each of the above use cases (and many more) has had AI in the background over the past 20+ years.

How much data was generated every minute

The AI Techniques

The 2000s also saw the two main AI techniques to get mainstream identity — Deep Learning for advanced AI applications and Machine Learning for basic (relative to Deep Learning) applications. ML and DL are both techniques to learn from a large data set, form judgment from these learnings and store these learnings in what is called as the AI model. Deep Learning’s major appeal is that the technique was founded to mimic the way a human brain learns — taking in the various inputs (through its senses), comprehending it through multiple layers of abstraction and matching it against prior experiences and learnings to determine the next course of action. By pivoting the approach to the human brain as the north star, the conviction in the AI community had always been strong that these models will over time get very close to or perhaps even outdo the human brain. This decade was where this conviction was validated and reinforced by the rapid applications and encouraging developments.

AI Spreads Silently

The 2010s saw the widespread applications of AI across businesses and consumer lives. On one hand, the smart phone revolution and social media kept feeding newer and more sophisticated use cases. On the other hand, cloud computing (which made heavy compute far more accessible) and hardware advancements proved to be the enablers of these advanced applications. AI kept delivering with constantly improving results. Examples include voice assistants (Alexa and Siri), face recognition on iPhones, online check deposits, self-driving cars, facial tagging on social media, MRI analysis, drug development among a host of other such AI applications.

The One Major Exception — NLU

Amid all this raging spread of AI, there was one area where there was a strong business need and customer readiness, but AI simply wasn’t delivering — it was the field of Natural Language Understanding (NLU). AI (Deep Learning) worked for understanding short robotic commands like “Alexa, set a reminder for…“ or ”I want to report a lost credit card“ on a customer service phone call. However, understanding of the truly natural language involving long sentences, paragraphs or pages remained a pipedream. In 2017, however, a research paper titled “Attention is all you need” gave just the breakthrough that was needed. It was through the introduction of an architecture called “transformer” which enabled understanding of long sentences, relationships between them (e.g. “it” in a sentence could be referring back to something mentioned earlier) and abstract concepts (e.g. what constitutes a “scary” scene etc.) — all of which our brain is inherently capable of. The Transformer architecture enabled the training of the AI on massive amount of data — basically the whole of the world wide web along with all books, publications or anything that was digitally available in the public domain. Just to get a sense for the scale — if a person were to read all the training data at the speed of 1000 words/hour, they’d take 1000 years to finish reading the data that was fed to this model. This unprecedent training effort led to the formation of AI models (now known as LLMs — Large Language Models) which were years ahead of their predecessors in terms of their accuracy of understanding natural language.

LLM’s Emergent Properties

While the step change improvement in NLU would in and by itself have been an incredible milestone — that was not it. These newly trained LLMs could do a lot more. They could perform tasks they were not even specifically trained for — called the “emergent properties” of these models. They could generate new content (songs, poems, articles, lyrics), understand mathematical expressions or code, perform translations, summarize content and whole lot more. This “generative” ability came as a double-edged sword — it represented creativity when it produced accurate or helpful results and represented hallucination when it produced unfounded or unsupported content. Worse yet, it produced bias or even harmful content in many cases. Nevertheless, it was evident that a major and unprecedent power was trapped in these. OpenAI recognized it and despite its shortcomings — decided to unleash this “work in progress” product — ChatGPT — to the world. That the decision proved to be a massive commercial success is a gross understatement.

The Conclusion that is the Beginning

While that seems like the “conclusion” in this story or article, we all know it actually represents a new beginning. Not that of just NLU or ability to summarize or create new songs and images or not even that of automating more business processes. These are just applications of the knowledge, concepts and wisdom gained and packed in this models. This new age AI represents the beginning of mass commoditization of a trait considered innately human, at least until now — intelligence. How this fascinating story of computing and extreme intelligence unfolds from here on — only time will tell. It’s a thought that is dizzyingly exhilarating and scary at the same time.

If you enjoyed the read, please show love through claps or by sharing/subscribing!