History of the Generative AI

Gilles Legoux
9 min readApr 16, 2023

TL;DR The rise in power of the generative AI is a revolution based on Deep Learning, transforming the tech world into a new one.

Generative AI is the state of the art of machine learning and has excelled recently with the apparition of AI ChatBots with high performance.

Generative AI for all types of content

Let’s discover its history, from simple textual machine learning models specialized to universal large language models at a vast scale:

  • 💬 AI ChatBots
  • 🔥 From NLP to Multimodal GAI ChatBots based on LLM
  • 🧪LLMs Timeline: Closed-source vs. Open-source
  • 🎯 From Specialized Model to Generalized ML Model
  • 📈 Scaling Laws & Performance
  • 🏁 Technologic and Economic Race
  • 🚀 Go Further with Limited & Augmented LLMs

💬 AI ChatBots

Building generalist AI chatbots is the most complex challenge for a technology that simulates human intelligence. It is the base of the artificial human assistant. They are experiencing dazzling success with the release of ChatGPT, a new step that has been overcome after Google Home and Alexa by AWS. The technical component behind these products has disrupted and improved the ecosystem considerably, although not perfect yet (perhaps it will never be).

The interest in chatBots followed the one for deep learning from 2010. Before, it was reserved for a restrained audience. Still, it exploded in November 2022 with the release of chatGPT open to the general public with a ramp-up of 100M Monthly Active Users (MAUs) after two months, a worldwide record (4.5x faster than TikTok) and 1.6B MAUs now. At the same time, machine learning and deep learning trends have continued growing in the background for ten years.

Google Trends from 2010 [source] | Similar web for chat.openai.com [source]

To have an order of magnitude, the human brain has, on average, 100 billion neurons and 100 trillion synapses. The estimated resources used for chatGPT+ (paying version of chatGPT) use deep learning models based on GPT-4 with an estimated model size close to these values (digits kept private by OpenAI). But a human neuron/synapse is much more powerful than a deep learning neuron/synapse. The gap is still important, above on reasoning tasks on unknown problems requiring many resources.

Human brain vs. Deep learning

🔥From NLP to Multimodal GAI ChatBots based on LLM

Thanks to Large Language Models (LLMs) working with Generative AI, chatBots continue getting closer to the objective of simulating human intelligence. The NLP AI models have progressively migrated from simple parsing to complex processing on the language structure, from shallow learning to deep learning, from a single-language vocabulary of words to a vocabulary of tokens with multi-language embeddings, from an RNN (LSTM) to the Transformer with multi-head attention layers approach.

NLP timeline with the Deep Learning [source]

Finally, by mixing other types of data (like images, audio, …), NLP models became multi-modal generative models where varied instructions (in input) give varied results (in output).

The Architecture of Deep Learning with Turing Award [source] | Unimodal and Multimodal Generative AI Models [source]

Deep learning is based on experimentation, and many problems are open theoretically and have not been proven.

🧪LLMs Timeline: Closed-source vs. Open-source

The number of LLMs has continued increasing since 2018 with this accelerated timeline at the beginning of the year 2023:

Timeline of release dates of LLMs with +10B parameters [source] | The evolutionary tree of modern LLMs traces [source]

The open-source community has accumulated an estimated delay of around six months to one year. The gap may continue increasing in the future. Recent and more performant models are publicly available but are all closed-source projects based on modified open-source projects.

However, the Hugging Face platform aims to expose models, datasets, and docs about all ML models to maintain the ecosystem as open-source as possible, with the philosophy that ML can evolve only with shared and open knowledge. The economic profits are huge, and products like chatGPT+, Bing (with a modified chatGPT), or Google Search (with Bard) don’t share their models to keep advantages over the competitors.

Open source vs. Closed source

You can use, deploy, and play with open-source projects with your chatBot with minGPT, Open-Assistant, or a GPT alternative Stanford-Alpaca based on a LLaMA model.

🎯 From Specialized Model to Generalized ML Model

The generalization did not work well before a model was trained to resolve a specific task. But with the apparition, transfer learning and model multi-tasks are game changers. Instead of using a specialized model directly, you can use a pre-trained (or re-trained on your data) generalized model on a specific task. The performance of the model and the cost could be less good than with the specialized model, but the difference is sometimes negligible for your use case. This approach requires less ML development and could be even better in some cases.

Machine learning generic tasks [source]

So, these multi-task generalized models can solve specialized tasks with performance that is sometimes better than specialized models but with additional costs. They can even beat humans on exams, thanks to previously emergent abilities not forecasted during training.

📈 Scaling Laws & Performance

This evolution is now a race to the performance with Large Language Models where the training is very long (~ weeks/months) on many GPUs (~ 10K) with a lot of parameters (~ 10⁹/10¹²) where the cost is not cheap (~ 1-10M$). The deployment infrastructure is costly for the inference on average (~ 0.10$/query of 1K prompt tokens [source]) for optimized web query latency (~ 100ms until 1s).

Exponential law of LLMs M parameters/months before GPT-4 [source] | Estimated number of parameters for the versions of GPT

ChatGPT+ is based on a multimodal GPT-4 model trained with a model size of 1 trillion parameters, a maximum context window of 32Ki tokens, a dataset size of TiB with billions of tokens, and ten thousand GPU for a total of FLOPs 10²⁵ running (all these values are estimated because it keeps private by OpenAI).

Estimation of FLOPs done to train GPT-4 for chatGPT+

So, a question is raised:

“Given a certain quantity of computing, how do we get the best possible performance for the model by resizing the datasets and/or the number of parameters of the model?”

By experimentation, the loss function follows a power law function of these three resources: compute, dataset size, and number of parameters without a bottleneck on several orders of magnitude. So, the required resources are costly with such complexity at scale to increase the performance of a model (that is to say, to minimize the test loss).

Scaling Laws [source]

For OpenAI, the model size is (almost) everything to get better performance at scale. However, DeepMind proved that the dataset and model sizes have the same impact on the performance. The experimentation gives conjectures and inferences with inevitable volatility and uncertainty.

Multiplicative contributions by OpenAI [source] | IsoFLOP curves by DeepMind [source]

A balance must be found between the dataset size (number of tokens) of training, the model size (number of parameters), and the resource of compute (CPU, GPU, TPU hours).

You can reduce the model size for a given model thanks to knowledge distillation or increase the dataset with data augmentation. The LLMs can be few-shot learners [source] and zero-shot reasoners [source] who resolve a problem from a few or no known use cases. The performance of LLMs is surprising; it outperforms all ML benchmarks (GLUE, SuperGLUE, Big-Bench, …) and some human tests (Human eval). Even new abilities appear at scale in these large language models [source].

🏁 Technologic and Economic Race

The apparition of AI generalized chatBot can disrupt the web search ecosystem, and economic issues are enormous (see [source]). Naturally, the two primary competitors are the current tech leaders of the search with AI labs partnerships, with Google Brain and Deepmind under Alphabet against Microsoft and OpenAI (where the principal investor is Microsoft). Respectively, they offer products functioning with closed-source technology but based on open-source modified projects:

  • chatGPT by OpenAI and Bing by Microsoft, working with a modified version of GPT using Pytorch
  • Bard by Google with a modified version of LaMDA using Tensorflow.
Bing vs. Bard

The race is not only on the product vision, model architecture, and technical framework but also on the platform and infrastructure with hardware. These cost structures will be modeled on Nvidia HGX A100 / H100 for Bing and chatGPT on Microsoft Azure datacenter, while Google Bard will use TPUv4 / TPUv5 Google’s in-house.

TPU vs. HGX

🚀 Go Further with Limited & Augmented LLMs

The LLMs models have limitations with hallucinations where the model alignment is imperfect, despite corrections with Reinforcement Learning from Human Feedback (RLHF) approaches. So a hacker can hijack the chatBot's behavior to do malicious actions with jailbreaking skills relatively easily, for example, to create fishing, to generate fake news, or to get private data, which raises problems of data privacy, ethics, and security problems.

The simulation of the reasoning capacity is related to the number of parameters, which is costly and hazardous. LLMs are not good at solving logical and mathematical problems. Also, when released, the LLM is not trained on fresh data due to the long training duration (several weeks/months), and the predictions can be out-of-date or unaware of recent changes or news.

Limitations of LLMs & Problems of Data Privacy, Ethics, and Security

But, the GAI ecosystem based on LLMs is young and continues to be improved and regulated. New projects appear with:

  • Toolformer by Meta AI to decide which web APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into a future token prediction for chatBot [source]
  • ReAct by Google AI to resolve complex tasks can split into several more manageable sub-tasks and merge results to determine the global task described in this article [source]
  • ChatGPT+ with plugins by OpenAI, like with WolFram plugins, to improve the capacity to resolve logical and mathematics problems [source]
  • Auto-GPT to have active and autonomous models instead of the passive state of chatGPT [source code]

A summary of this blog post is available here:

🔗 References

Only the leading research papers are quoted.

History of Deep Learning, LLMs, and GAI:

Scaling Laws:

Performance:

Limitation:

Augmented LLMs:

--

--

Gilles Legoux

💻 Senior Software Engineer on Data and Applied Machine Learning. Here are posts about all these aspects of engineering. More details: https://glegoux.com