ChatGPT and the Model Behind

Published in

SFU Professional Computer Science

13 min readFeb 11, 2023

Authors: Yuxin Shao, Caixuan Wang, Daisy Xu, and Xinrui Zhang
This blog is written and maintained by students in the Master of Science in Professional Computer Science Program at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit sfu.ca/computing/mpcs.

Introduction

What do you think is the most popular AI in recent days? ChatGPT! I think most of you would have this answered right away. With the strong ability of text generation, ChatGPT can create articles, poetries, stories, new reports, and so much more.

Imagine that you want to write an essay, and you have no idea how to start it. What would you do? Well, you could consult ChatGPT about it by entering a basic description of the requirements, and it can generate a sample essay for you! What is more amazing is that the essay is written in human-like text, and most of the time, it is impossible to tell the difference.

Based on OpenAI’s study, the human accuracy rate at detecting whether longer articles were produced by GPT-3 is about 52%, which is basically by chance. As an application based on GPT-3.5 (the fine-tuned version for GPT-3), the text generated by ChatGPT is also hard to be recognized. This makes writing easy, but it also raises a few concerns. How can people know that an article is written by an actual human or machine? Even for this blog that you are reading right now, are you 100% sure that this is written by an actual human? (Yes, this blog is written by an actual human).

In this blog, we will dig more into ChatGPT and its model behind. We will talk about what it is, what it can do, what its underlying model is, and what its limitations are.

What is ChatGPT

ChatGPT is an advanced AI chatbot trained by OpenAI and released in November 2022. It is a fine-tuned version of GPT-3.5. As a chatbot, ChatGPT could answer any type of input from users, including debugging computer programs, composing music, and writing poetry and song lyrics. It is the most popular AI product today, bringing in one million users within five days and 100 million users within two months after launching.

What Can ChatGPT Do

ChatGPT was launched for research purposes to learn the capability of its powerful model through user feedback. According to its creator, OpenAI (2022), ChatGPT can interact with humans in a more advanced manner through “[answering] follow-up questions, [admitting] its mistakes, [challenging] incorrect premises, and [rejecting] inappropriate requests”.

Not only is ChatGPT capable of answering well-formed general questions, such as “what is quantum computing?”, but it can also be used to debug programming code, solve math problems, provide personalized product recommendations, and translate between different languages. Let us look at some samples of how ChatGPT operates to get a better sense of its powerful algorithm.

Demonstration #1: Solving a Programming Problem

There have been many chatbots published in the past, but ChatGPT is next-level. Responding to human questions seems to be what a chatbot is supposed to be capable of, but what about writing code based on the description of an algorithm?

Suppose we are trying to write a Python program that reverses digits in an integer. For instance, if the input is “123”, then the program should output “321”. For anyone that is familiar with LeetCode, this is a medium-level question; its full description can be found at this link. Since the submission acceptance rate for this problem is 27.4%, some programmers might have had some trouble solving it. Now let us take a look at how well ChatGPT can do:

Given the problem, ChatGPT was able to return the program in less than 20 seconds, but how is the quality of its response? To evaluate its performance, we can submit this program on the LeetCode platform for accuracy and efficiency checks:

In terms of its runtime, it beats 98.36% of all other submissions. Compared to most human programmers, the bot has developed an efficient algorithm that meets the evaluation criteria in a very short amount of time.

Demonstration #2: Customized Product Recommendation

Have you ever been in a situation where you are purchasing a new product, and you would like to pick the most economical model from all brands that are currently in the market? There are many different websites to click through and many different product descriptions to read over to find the best option(s).

This process seems tedious, and ChatGPT can save us from that:

In under 30 seconds, ChatGPT was able to return a list of laptop models that are under $1500 with at least six cores and 16GB of RAM. Its thorough response also includes other product configurations, helping users to pick out the most ideal product, and all we had to do was just inputting the prompt.

The Model Behind

GPT-3

GPT(Generative Pre-trained Transformer) is an autoregressive language generation algorithm developed by OpenAI. GPT-3 is the 3rd generation of GPT. GPT-3 model is based on the transformer architecture proposed by Vaswani et al. in 2017 to generate human-like texts. It has the same underlying model as GPT-2, but it is trained on a larger dataset. The dataset is about 45TB which includes content from web crawling, books, and Wikipedia. GPT-3 identified more than 175 billion model parameters, which are 10 times more than what the GPT-2 model has.

Before talking about how GPT-3 works, firstly, we need to know what is transformer architecture and how it works.

Transformer

Before the transformer was proposed, we use Encoder-Decoder architecture based on RNN. Because of the use of gradient descent, RNN has the problem of vanishing gradient which is very difficult to get around for scientists.

Transformer avoids this problem by using only Attention to substitute RNN in the Encoder-Decoder architecture. Transformer has a similar structure as Encoder-Decoder (see picture below). The left block is the encoding component which consists of a stack of N encoders, and the right block is the decoding component which contains a stack of decoders of the same number.

Encoder

Each encoder is made up of two major layers: the multi-head self-attention layer and the feed-forward layer. Multi-head self-attention layer uses all the input vectors to produce the intermediate vectors with the same dimension. This process mixes the information of all the input vectors. The feed-forward layer is the fully connected neural network which is independent of each intermediate vector produced by the multi-head self-attention layer. After going through the feed-forward layer, the new vectors are sent upwards to the next encoder.

Decoder

Each decoder is made up of three major layers: the masked multi-head self-attention layer, the encoder-decoder self-attention layer, and the feed-forward layer. The output of the top encoder will be transformed into a set of attention vectors and fed into the encoder-decoder self-attention layer to help the decoder to focus on the appropriate position of the input.

We repeat this process at each decoder block. The intermediate vectors go through the feed-forward layer in the decoder and are sent upwards to the next decoder. The output of the top decoder goes through the linear layer and softmax layer to produce the probability of the words in the dictionary. We choose the word with the highest probability (score), then we feed the output back to the bottom decoder and repeat the process to predict the next word.

Self-Attention

Self-Attention gives the weight of each element of the input sequence that indicates the importance in the processing of the sequence. Given the weight, we can get the information on how much attention we should pay to each element.

Multi-head self-attention means that we calculate multiple intermediate vectors and combine them together to get new intermediate vectors with the same dimension as the input vectors. Multi-head self-attention allows us to get the relationship between input vectors from different perspectives.

The masked multi-head self-attention layer means that we add a mask to the layer so that the model can only see the constrained window size of the sequence. Specifically, in the decoder, we only let the model see the window size of the previous output sequence but not the position of the future output sequence.

GPT-3 Architecture

GPT-3 uses only the decoding component of the transformer. Each decoder consists of two major layers: the masked multi-head self-attention layer and the feed-forward layer. In the largest GPT-3 model, we use 175 billion parameters, 96 self-attention layers, 2048 tokens window size of the mask, and 96 heads of self-attention per multi-head self-attention layer. Like the transformer, GPT-3 generates the output text one token at a time, based on the input and the previously generated tokens.

GPT-3.5 (ChatGPT)

GPT-3.5 is the fine-tuned version of GPT-3 by adding RLHF(reinforcement learning with human feedback) to the fine-tuning stage of the GPT-3 model.

RLHF (Reinforcement Learning with Human Feedback)

There are three main steps involved in RLHF: pre-training a language model (LM), gathering data and training a reward model (RM), and fine-tuning the language model with reinforcement learning.

In ChatGPT, we use the supervised fine-tuning (SFT) version of GPT-3 as the language model.

The goal of RM in RLHF is that given a sequence of text, RM can return a scalar reward that should represent human preference. The data used to train the RM is gathered by the following steps. First, we give a set of prompts from a predefined dataset to the LM and get several outputs from the LM. Second, human annotators rank the outputs for the same prompt from the best to the worst. Third, RM uses the annotated dataset of prompts and the outputs generated by the LM to train the model.

For the reinforcement learning part, we first make a copy of the original LM from the first step with a policy-gradient RL PPO (Proximal Policy Optimization). For a given prompt sampled from the dataset, we get two generated texts from the original LM and PPO model. We then calculate the KL divergence between the distribution of the two outputs. To calculate the reward that can be used to update the policy, we use the reward of the PPO model (which is the output of the RM) minus λ multiplied by the KL divergence.

Other GPT-3 Empowered Applications

Using Natural Language Processing, GPT-3 analyzes inputting texts and generates responses that resemble how humans would answer a question. Up until 2021, over 300 applications with developers from all around the world are powered by GPT-3 (OpenAI, 2021). These applications span a variety of industries, from technology with products like search engines and chatbots to entertainment, such as video-editing and text-to-music tools. This section will outline two more popular GPT-3 applications, besides ChatGPT, from different categories, including demonstrations of how they are being used in people’s daily lives.

MusicLM — Music Generator that Creates Music from Text Description

What is it?

MusicLM is a text-to-music model created by researchers at Google, which generates songs from given text prompts. The developers claim that MusicLM “can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption” (Google Research, n.d.).

Demonstration

Since the model has not been released to the public for any commercial or research purposes, samples of the caption and generated music pairs can be found on its webpage at this link.

On their website, the researchers show some examples of auto-generated music along with the texts that the music is produced from. By reading along the caption while listening to the audio, the audience can easily relate the two pieces together.

It may appear to be fascinating to some people that the auto-generated music really is an artistic representation of the given captions. MusicLM reached another step in AI music generation because it has overcome many challenges such as incorporating emotions and creating coherent but authentic music from just textual descriptions. For example, when a caption says “a sense of urgency”, how should the model express this emotion in the form of music which can relate to so many people at the same time?

checklist.gg — Business App that Suggests Optimized Workflow Based on Requirements

What is it?

checklist.gg is a management tool that helps to optimize project and organization workflows by creating checklists or standard operating procedures for a given process. It can be easily customized based on special requirements, providing ease to people with minimal prior domain knowledge to complete certain tasks.

Demonstration

Imagine a situation where you are a Computer Science student who developed a web game as your personal project. After completing the app, you would like to deploy the game and promote it to a broader audience. However, perhaps product marketing is not your strong suit as a CS student, and this is where checklist.gg can help.

By passing the prompt “Product Marketing for A Web Game Checklist” to checklist.gg, it returns a sequence of steps to best promote your game, shown in the following figure.

As we can see, it lists a step-by-step guide on what people can do to promote a web game. With a similar idea, checklist.gg comes in handy for people working in any industry or for people that are stuck on getting started for any kind of project.

Limitations

Despite its impressive capabilities, ChatGPT still has some limitations that are important to be aware of. This section will highlight three of its known limitations.

One of its limitations is its knowledge base, as it was trained with data that has a cutoff date of 2021, meaning that it may not be aware of recent events or developments. Its responses may not be up-to-date or completely accurate. For instance, if it is asked about the head of state of the United Kingdom, it will provide an answer of Queen Elizabeth II.

Another limitation of ChatGPT is that it has the potential to exhibit biases. The biases and viewpoints that are present in the texts used to train it may impact the accuracy and objectivity of its responses. For example, if users ask a math question, it may provide a wrong answer if the training data contained incorrect information.

Its third limitation is that, as an AI language model, ChatGPT does not have the ability to conduct original research or gather new information. It was not specifically trained to generate citations or guarantee credibility for the information that it provides. Thus, while ChatGPT can certainly provide information and context on a wide range of topics, it may not always be able to provide exact sources for that information. It is important to keep in mind that its responses should not be taken as authoritative or definitive, and the responses should always be independently verified if they are used for important decision-making or academic purposes.

Additionally, ChatGPT is designed primarily as a language model, and as a result, it is limited in its ability to perform certain tasks, such as image or speech recognition. It should not be relied upon as a substitute for specialized AI systems in these areas.

In conclusion, while ChatGPT is a powerful tool for generating text, it is important to be aware of its limitations and to use it responsibly. By understanding its limitations, we can better evaluate its outputs and use it effectively to augment our own knowledge and capabilities.

Conclusion

ChatGPT is a very successful language model interface for people to use. Benefiting from the large data source that it was trained on, ChatGPT can achieve reasonable accuracy in many different problems in the NLP field. Although it has some limitations due to its restrained knowledge base, potential bias, and lack of credibility, it can still help people in many other aspects. For instance, this article has shown a couple of ways on how it can help people in increasing their productivity and making meaningful transformations between texts and other forms of information.

References

Transformer:

The Illustrated Transformer: https://jalammar.github.io/illustrated-transformer/
Attention Is All You Need: https://arxiv.org/abs/1706.03762
Self-Attention and Transformer: https://luweikxy.gitbook.io/machine-learning-notes/self-attention-and-transformer

GPT-3:

How GPT3 Works — Visualizations and Animations: https://jalammar.github.io/how-gpt3-works-visualizations-animations/
The Illustrated GPT-2 (Visualizing Transformer Language Models): https://jalammar.github.io/illustrated-gpt2/
Language Models are Few-Shot Learners: https://arxiv.org/abs/2005.14165v4
GPT-2 GitHub Repository: https://github.com/openai/gpt-2

ChatGPT:

ChatGPT — Optimizing Language Models for Dialogue: https://openai.com/blog/chatgpt/
Illustrating Reinforcement Learning from Human Feedback (RLHF): https://huggingface.co/blog/rlhf
GPT-3.5 + ChatGPT — An illustrated overview: https://lifearchitect.ai/chatgpt/

GPT-3 Applications:

GPT-3 Powers the Next Generation of Apps: https://openai.com/blog/gpt-3-apps/
MusicLM: https://google-research.github.io/seanet/musiclm/examples/

ChatGPT and the Model Behind

Introduction

What is ChatGPT

What Can ChatGPT Do

Demonstration #1: Solving a Programming Problem

Demonstration #2: Customized Product Recommendation

The Model Behind

GPT-3

Transformer

Encoder

Decoder

Self-Attention

GPT-3 Architecture

GPT-3.5 (ChatGPT)

RLHF (Reinforcement Learning with Human Feedback)

Other GPT-3 Empowered Applications

MusicLM — Music Generator that Creates Music from Text Description

What is it?

Demonstration

checklist.gg — Business App that Suggests Optimized Workflow Based on Requirements

What is it?

Demonstration

Limitations

Conclusion

References

Written by BetaBots