ChatGPT in a Nutshell

Jeremy Li
6 min readJul 15, 2023

Since the public release of OpenAI’s ChatGPT, it has continued to surprise the world with its capabilities. For the past seven months, people have been exploring and playing around with this intriguing artificial intelligence. While many are still learning to utilize and leverage this technology, most of us don’t fully understand how it works. So, how does ChatGPT actually work, how was it built, and why does it work?

Before diving into these complex questions, we need to first understand what this technology is. In simple terms, ChatGPT is a chatbot — somewhat like a human. It interacts with users through text input and returns a response, just like a conversation between two humans. In a way, it’s similar to Google, where you ask a question, and it returns an answer. However, Google and ChatGPT have vastly different methods of generating responses.

Google vs ChatGPT: Different Ways of Generating Responses

When you search for something on Google, this search engine locates pages that match your query in its database, ranks them, and then outputs them in a sorted sequence. ChatGPT operates in a vaguely similar manner, generating a response based on patterns and knowledge it learned from its training on colossal text data.

Let’s take an analogy: imagine you have a question on world history and walk into a library filled with books on a wide range of topics. Google’s search engine is like an informed librarian who quickly finds relevant books for you based on your needs. The librarian searches through the library’s vast collection, retrieves books that contain potential answers, and presents them to you. On the other hand, ChatGPT is like a well-read individual who has studied a large number of books in the library. It doesn’t have access to the library, but it relies on the knowledge it gained from reading and understanding a diverse range of books. As a well-read individual, it provides an answer to your query based on what it learned from those books. In terms of this AI chatbot, this reading and understanding process is called pre-training.

The Training Data: A Massive Text Collection

The dataset used to train ChatGPT is enormous! GPT-3.5, the version that’s currently available and free for all, was trained on 45 terabytes of text data. To comprehend this size — a standard 10-minute video at 1080p resolution is about 240 megabytes, and a typical iPhone photo is about 5 megabytes of storage. So, with 45 terabytes of data, we’re talking about 196,648 10-minute videos at 1080p or 9,439,104 standard iPhone photos. However, text data takes up significantly less storage than pictures or videos, so think BIG when considering ChatGPT’s training data size.

Specifically, the chatbot was trained on large text collections, including books, Common Crawl, a dataset containing years of internet data since 2008, and WebText2, a dataset covering all Reddit submissions from 2005 to 2020. During training, it learns patterns and relationships in the training data without specific question-answer pairs. Moreover, it was trained on over 160,000 conversational dialogues from movies and daily life to fine-tune its chatbot abilities. This is how ChatGPT can generate natural and engaging conversational responses while possessing such a broad range of knowledge.

How ChatGPT Generates Responses

ChatGPT Work Flow

So, what now? How does it use this knowledge to generate responses? Given this pre-training knowledge, ChatGPT can generate responses based on the information it has been exposed to. The AI’s success lies in its ability to generalize from the patterns it has learned in the training data. Therefore, the responses are often coherent and contextually relevant because they follow the word patterns that humans have used in academia, real dialogues, and on the internet.

During the training process, the model learns to predict the next word in a sentence based on the context of the previous words. This helps the model capture patterns, grammar, and semantic relationships in the text data. Moreover, this is how ChatGPT generates its responses. Let’s say you ask ChatGPT — “What is an apple?” The AI will first break down the question into tokens and use them as context to predict the next tokens in the sequence. Then, the model will use its training knowledge to calculate a score for each possible next token. A score is calculated based on learned patterns between words. For instance, a common response to the question is “An apple is a…”. So, it might assign a high score to the token “is”. The model then turns these scores into probabilities, and the token with the highest probability will be the next token in the sequence. Finally, the model will iterate this process of tokenization, contextualization, and prediction until forming a response.

The Limitations of ChatGPT

Interestingly, it is important to note that ChatGPT does not understand what an apple is in the way humans do. It’s simply predicting the most likely response based on patterns it learned during training. Consequently, it may not always be accurate or reflective of real-world facts because it’s built solely on statistical patterns from pre-existing texts. This is often why we see that ChatGPT doesn’t always produce correct answers, particularly in areas like mathematics. Let’s look at an interesting dialogue that showcases GPT-3.5’s flaws and mistakes:

Jeremy: If it takes 5 machines 5 minutes to make 5 devices, how long would it take 100 machines to make 100 devices?

GPT-3.5: If it takes 5 machines 5 minutes to make 5 devices, it means that each machine takes 5 minutes to make 1 device. Therefore, we can conclude that each machine can make 1 device in 1 minute. Now, if there are 100 machines, and each machine can make 1 device in 1 minute, then all 100 machines working together can make 100 devices in 1 minute. So, it would take 100 machines to make 100 devices in 1 minute.

Mathematical response from GPT-3.5 architecture

The correct answer is: If it takes 5 machines 5 minutes to make 5 devices, it means that each machine takes 5 minutes to make 1 device. Therefore, if there are 100 machines, each machine can still make 1 device in 5 minutes. So, it would take 5 minutes for 100 machines to make 100 devices. ChatGPT, however, derived an incorrect answer from its probabilistic model. From this dialogue, the main takeaway is to not blindly trust AI — they can make mistakes too!

The Power and Potential of ChatGPT

All in all, this new AI chatbot is quite remarkable. Despite its occasional inaccuracies, ChatGPT represents a significant leap forward in the field of artificial intelligence. Its ability to generate human-like responses and engage in meaningful conversations is a testament to the power of machine learning and the potential of AI. As we continue to refine and improve these models, we can expect even more impressive capabilities in the future. The possibilities of what it could do for us are only limited by our imagination.

Thanks for reading! Feel free to follow for more posts to keep up with state-of-the-art technologies in data science.

Sources

  1. How does ChatGPT actually work?
  2. ChatGPT — I asked the chatbot some extensive questions for context

--

--

Jeremy Li
0 Followers

Undergrad @ Carnegie Mellon University