What makes ChatGPT so special?

6 min readFeb 8, 2023

Welcome to Part 2 of the series:

Chatbot evolution: how enterprises can safely use the latest language models like ChatGPT

For those that have just come out from under their rock and don't know what ChatGPT is:

ChatGPT is an AI model trained by OpenAI that can engage in complex conversations.

Its conversational capabilities allow it to respond to follow-up questions, correct itself, challenge false statements, write code and decline inappropriate requests at an impressive level of fidelity and accuracy.

People are finding its performance very impressive.

It reached 1 million users in 5 days, faster than any other software platform.

In 2 months, it has garnered over 100 million users.

That's not all; it scored 52nd percentile on the SAT.

And passed the BAR exam (C+, but still…)

If these feats are impressive, you should see what other AI models are doing.

But here's the thing.

ChatGPT isn't drastically new.

GPT3 has been around for 2.5 years beforehand.

From a technical perspective, the techniques used aren't revolutionary.

So why are we so impressed by ChatGPT? Considering GPT -3's output is so similar?

We won't cover everything which, frankly, are some of the more significant factors in ChatGPTs skyrocketing popularity, such as how OpenAI:

Executed its marketing
Created a user-friendly interface
Took the risk to release the model publicly when other, larger organisations, such as Google with its LaMDA model, decided not to.
Scaled its infrastructure to support many users (which is incredibly impressive but not innovative).

What we care about, for now, is how they improved on their original GPT3 model.

From a technical perspective, they've improved on GPT3 in four ways:

Fine-tuned the 'goal' of the model output
Built a custom dataset to fine-tune both GPT3 and a reward model to achieve better performance against said goal
Used reinforcement learning and human feedback to improve model output
Added a content filter using their moderation API

The goal of the model output

Adding "instructions" to the text embedding model drastically improves performance.

GPT3's goal is to predict the next token (word) based on the previous tokens, including the ones from the user's prompt.

ChatGPT's goal is different and more like: "follow the users' instructions helpfully and safely" — we can call this change in goal: "intention".

The University of Hong Kong released a paper which goes into much more detail about using "intention" when embedding for increased performance on most NLP tasks. You can try the embedding mode yourself on hugging face.

An excellent example of how fine-tuning a model to a user's intention is OpenAIs very own InstructGPT — this model gives responses that "feel better" than GPT3, despite only being 1.3B parameters vs 175B and is the sibling to ChatGPT.

In short, InstructGPT (ChatGPT's sibling) was created using three steps which we'll break down into more straightforward language:

Step 1: Create a dataset with labels of how the model should respond

ChatGPT was fine-tuned on a (relatively) small amount of data.

It's first fine-tuned with supervised learning (human guidance — the human annotates how the model should have responded)

In the case of InstructGPT — they hired 40 human labellers (ethically questionable labellers) to generate the training data.

They then fine-tuned GPT3 using the new training data of 12,725 correct responses.

Step 2: Create a reward model

Now we have this newly trained supervised GPT3 (let's call SGPT3)

You now put SGPT3 to the test and get it to output multiple attempts from a given instruction.

Of those (say four) SGPT3 outputs, you give to a human labeller that ranks in terms of quality or accuracy and stores those rankings.

This resulting ranking data (33,207 prompts and ~10x more output samples) is used to train a separate reward model.

Step 3: Absolute chaos.

Once done, OpenAI 'attach' SGPT3 with a reinforcement learning algorithm called Proximal Policy Optimisation to SGPT3 (PPOSGPT3)

They then fine-tune PPOSGPT3 by:

Feeding a prompt to PPOSGPT3
PPOSGPT3 outputs a response.
The reward model generates a reward value (rating how good quality the output is)
The reward model then fine-tunes PPOSGPT3 depending on how good/bad the output was.
Repeat 1–4 thousand and thousands of times until the reward model is 'happy'.

The finished product:

*deep breath*

Reinforced Proximal Policy Optimised Supervised Generative Pretrained-Transformer 3!

Or simply:

InstructGPT.

Content Filter

Here, we see the beginnings of the multi-module architecture, which we will dive into in two chapters' time!

OpenAI filter what ChatGPT responds with using a custom tool they built themselves called Content Moderation Tooling API.

It's a new tool designed to help companies and organisations moderate user-generated content more efficiently and effectively.

This API uses machine learning to analyse text and images and identify content that may violate community guidelines, hate speech, or other inappropriate content.

You can find the paper here if you are interested in how it works.

Despite the complex and fantastic work OpenAI have done so far in this area, there are still some concerning and hilarious "jailbreaks" to beat the filter.

The award for the strangest jailbreak goes to…

If you need deep belly laughs, check out this full of jailbreak examples.

Why are you hating on ChatGPT and claiming that it's not safe for enterprise use?

I hope you've looked at the jailbreak article and can now see these LLMs' enormous risks.

As we've covered in the basic breakdown of how these models work in part 1, LLMs have two critical weaknesses:

They're algorithms designed to predict the next best word based on the previous sequences of inputs.
They don't understand language, and their 'knowledge' isn't based on facts, so they spout incorrect information.

Therefore:

LLMs are incredibly difficult to control and guarantee what text they generate.
Human ingenuity will find a way to break your content filter.

If released in the wild, your company could suffer from various damages and experience what Microsoft and Meta have — AI going rogue.

Meta followed in Microsoft's Taybot footsteps with BlenderBot3

So is that it?

Is public-facing generative AI only possible for companies willing to take on the same risks as OpenAI?

We must first implicitly understand the Large Language Models' weaknesses to answer this question.

Which we cover in our next episode of: