Wake up, Neo… [Prompt Engineering]

Isaac Yimgaing
8 min readMar 17, 2023

--

We are in November 2022, you wake up and all around you, peoples talking ChatGPT, a disruptive IA chatbot making miracles. What’s fucking going on? You use it just for laughs and you realize: this chatbot is just uncredible like no else before. But the same question reminds:

What’s going on?

To understand, let’s us first jump to the pass to better understand the present.

Brief story of IA

Even if the story of the IA field is very long, let’s focus on most keys concepts make all this happen.

Before 2010, all IA models and technics were based on statistics models. To train the model, the engineer had to give data format in such way (Thinking like an excel sheets):

  • list of columns describing the use case (House informations: number of beds, superficies, town, floor, etc.)
  • The target variable the model has to learn in other to predict in the future (house price)

The job of the training was to try to understand some abstract relationship between inputs informations and the target. To help the model on this task, engineers needed to extract from original inputs, much deeper informations (like the relationship between number of bathrooms and bedrooms) in other to give more insight to the model. Once this was done, the model could learn from the original and augmented information and could now predict the price of a house from given information about the house. And you know what, that working well still know.

But something has changed. In the years 2007, Steve jobs Apple’s CEO presents Iphone, a revolutionary smartphone and a new economy appears. Everyone has started using smartphones and therefore producing terabytes of data through installed applications. But these data for the most part were no longer simply numerical (in the sense of numbers) as in an Excel sheet, but more varied: text, images, audio, video, etc. Big Data is born.

Because our current AI models could not handle these new types of data efficiently, the researchers tried and found a new top-level model architecture.

The king is dead, long live the king

In 2012, Stanford University’s ImageNet Large Scale Visual Recognition Challenge pitted computer science research teams from around the world against each other in an image recognition challenge. And in this edition, the winning program shattered all previous records by using deep learning for the first time. Introduced by Yann Le Cun, a French researcher, the Deep Leaning Neural Network (Deep Learning NN) is a new AI architecture based on a simplified architecture of the human brain.

Like a human brain, a neural network is composed of neurons interconnected by links like a network (known as a fully connected NN). But what are the real changes? In the previous use case, the engineer must extract deep informations from the original input to help the model. No need to perform this task anymore.

With NN, the network takes original inputs and finds by itself the deep or abstract understanding to get the best result. The learning process is based on three key concepts:

  • The deep architecture of the NN: it’s refers to its structure, specifically the size (number of neurons, layers, connections) and therefore, the numbers of parameters
  • Back-propagation: it’s a process used in artificial neural networks to improve their performance in tasks such as pattern recognition, prediction, and classification.
  • Gradient descent: it’s a common optimization algorithm used in neural networks. It’s goal is to reduce the AI error in the training process

A new world opened up to us. It was now enough to transform any type of data into a sequence of numbers (embedding vector) in order to process them. This was the beginning of a golden age of AI boosted by NN. In less than 10 years, we have built a series of solutions based on this technology: facial recognition, image detection/segmentation, autonomous driving, etc.

By focusing on textual data, this architecture helps to build (chatbot, machine translation, text sentiment analysis, etc.). But from 2015 forward, as the technology matured, we needed a better understanding of the texts, which was not possible with NN. One of our main problems was to build a more contextual meaning of a given world.

To understand this, let’s take this example: “The bomb Mbappe falls in the 8th final of the champion league.”. The word “bomb” here has a positive meaning. How to handle it?

From Neural Network to Transformer

In 2018, Google Research comes out with a paper that will turn everything upside down: “Attention is all you need.
In this paper, Google Research introduces a new AI architecture called Transformer. The goal of Transformer is to capture meaning in a more advanced way. How does it work?

Let’s say we have this image.

A child running on the beach

As humans, we can all describe it in less than 10 seconds (see the caption if you can’t😉). But wait a minute. How do we accomplish this task step by step?
Well, perhaps you would start by analyzing the boy. Then you analyze his position and conclude that he is running. Then, you add the context of the beach. Finally, you reconstruct the overall meaning. Well, I understand that you paid attention to the different elements (attention is all you need!😎).

This is the same concept that is replicated in Transformer architecture. It’s about paying attention or focusing attention on different parts of the image in order to better understand it. This new AI architecture gives rise to several large models: BERT (Bidirectional Encoder Representations from Transformers) and others close to it (CamemBERT, RoBERT, etc.).

But what does this have to do with ChatGPT, there is no “BERT “ in the name ChatGPT.

Yes, you are right. The real link is the “T” for Transformer. ChatGPT is base on GPT (Generative Pre-trained Transformer).

From Transformer to LLM — ChatGPT

Transformer is one of the latest advanced architectures present in most current AI models. Thanks to its attention mechanism, it can be used in almost all AI use cases. And the interesting thing is that it is simply an architecture brick, like a lego brick. By using it, we can combine it in different ways to produce different types of AI models. The system basically consists of two parts: the encoder and the decoder.

The encoder helps to build a hidden representation of the input that capture the meaning. For example, for a translation task (from english to french), the encoder will build a hidden representation of each word in the english input sentence.

Once the encoder has created a hidden representation of the English sentence, the decoder takes that representation and uses it to generate the corresponding French sentence, word by word.

If we decide to stack 12 or 24 encoder transformer blocks, we build BERT. It does not have a decoder component because it is not designed for language generation tasks, such as machine translation or text completion.

If we decide to stack 12 or 36 decoder transformer blocks, we build GPT .It is design for generating natural language text, such as language translation and text completion (note: GPT uses an encoder block as part of its decoder architecture. However, the encoder block in GPT is only used during pre-training, and not during the generation of new text).

The original GPT model, GPT-1, has 12 transformer decoder blocks, 48 for GPT-2, up to 96 for GPT-3. The more decoders a model has, the more data it can process at once (4096 for GPT-3, 25 000 for GPT-4).

Well, is that all?

No. GPT (from version 3 to upper) has 3 others advantages:

  • AI model size (like brain size): BERT 340 M parameters vs. 175 B for GPT 3 or 100,000 B for GPT 4
  • The training data used to build the model: BERT uses 3.3 billion words and GPT-3 499 billion words.
  • The goal: The BERT is design for text representation and GPT for text generation. Therefore, GPT-3 can easily be transform to handle conversational use cases such as: answering questions, summarizing text, translating text into other languages, generating code, generating blog posts, stories, conversations and other types of content. These Big AI models are called LLMs (Large Language Models).

ChatGPT : The game changer

While GPT is not the first or only LLM, they have created ChatGPT and that is what makes all the difference. ChatGPT is a simple and open chat bot interface where anyone can chat with the AI and own all the power in their hands. Released in November 2022, it simply shook up the established order by making the giants (Google), but also the common man tremble. read this article if you are interrested).

Season 1: The birth of a hero (ChapGPT/Microsoft x Google)

Indeed, he revealed a knowledge known by some of us. The superiority of human intelligence in the strict sense may not exist. By managing to write complete theses for students, by creating works of art, by deciphering the tax form like a chartered accountant or by passing the bar exam with brio, ChatGPT has challenged the established order, our hours spent learning and above all, for some, proved our limits and therefore our non-essential character.

While it is true that these models produce impressive results, they are also capable of producing incredibly false results and even subject to hallucinations (we are not immune to mental illness in AI).

Neo: But Morpheus, is this the end?

Morpheus: No Neo, because these AIs have a small weakness that makes them so powerful and you can control them: The prompt

Morpheus: Now that you know all this, you have two choices.

Take the blue pill (by clicking here).

And you will be redirected to a funny tiktok video that will make you forget everything in a few minutes.

Take the red pill (by clicking here)

and start your quest in the matrix. There will be no turning back Neo!

--

--

Isaac Yimgaing

Passionate about data and its applications, I support business teams in building intelligent use cases. Let’s connect on /in/isaac-yimgaing