How to Speak ChatGPT

A Layperson’s Guide to Understanding AI Language Model Communication

Glenn Hopper
13 min readApr 18, 2023

Unless you’ve been hiding under a mechanical cotton loom for the past six months, you’ve probably heard about the rise of ChatGPT, a large language model (LLM) that might just be the digital equivalent of sliced bread.

ChatGPT and other AI-powered LLMs have the potential to revolutionize our personal and work lives more than the internet, smart phones, and horseless carriages combined. But before we adopt usage of this remarkable new technology en masse, it is incumbent on us as potential users to develop at least a basic understanding of what’s going on under the hood of these massive algorithms. Because without that minimal understanding of this technology, we might as well hand our decisions to those magic 8 balls or perhaps a handful of deftly tossed chicken bones.

I’ve written previously and at length about what Artificial Intelligence is and isn’t, and if you have a day and a half to comb through my latest ebook on the topic, you can find that here.

In this article, I’d like to take a brief look at how we communicate with LLMs. (This is not another “10 Prompt Engineering Tips” blog. Rather, it’s an overview of what the robots are actually doing with our text prompts and what’s going on behind the scenes when you ask ChatGPT to “Tell me how to build a deck in the style of Leviticus.”)

Now that I’ve thrown out enough corny introductory content to convince you this article wasn’t written by ChatGPT, let’s dive in to the core concepts of how LLMs like ChatGPT are trained, and how they communicate using vectors, embeddings, and chunking, and discuss practical applications for leveraging LLMs in various domains.

Don’t know the difference between a GPT and a GPA?

GPA stands for Grade Point Average … or maybe Gallons Per Acre.

GPT is the airport code for Gulfport, Mississippi. But also can mean Generative Pre-trained Transformer. Probably the latter is more germane here (unless anyone wants to meet at Shaggy’s for some peel-n-eat royal reds), so let’s talk about that one.

A Generative Pre-Trained Transformer is an advanced machine learning model developed by OpenAI, designed primarily for natural language processing tasks, such as text generation, translation, summarization, and question answering. The GPT model is based on the Transformer architecture, which was introduced by Vaswani et al. in 2017.

The Transformer is kind of a big deal.

The Transformation of LLMs

Natural language processing (NLP) algorithms have been around for years, powering common tools like predictive text and web-based chatbots. But the massive leap in the power of these models started with a 2017 research paper called “Attention Is All You Need” by a team of former Google researchers. The paper introduced the Transformer architecture. This new approach to NLP has since become the basis for many state-of-the-art LLMs like GPT, BERT, and others.

Before the Transformer, NLP tasks primarily relied on recurrent neural networks (RNNs) and convolutional neural networks (CNNs) to process and understand text. However, both RNNs and CNNs have limitations in handling long-range dependencies and parallelizing computations.

The key innovation in the Transformer architecture was the self-attention mechanism, which allows models to weigh the importance of different words in a sentence or context when generating output. By focusing on the most relevant words, the self-attention mechanism enables these models to understand the relationships between words and their context more effectively than RNNs or CNNs.

Training a Large Language Model

LLMs “see” the world through the lens of the text they are trained on. They learn to understand language patterns, context, and even some factual information by predicting words in sentences and refining their internal representations.

The training process involves two main steps: pre-training and fine-tuning.

Pre-training: In this phase, the model is trained on a large corpus of text data collected from the internet. The dataset includes web pages, articles, books, and more, but it does not know the specifics about which documents were in the training set. The model learns the structure of the language, grammar, facts about the world, reasoning abilities, and some biases present in the data during this phase. It’s important to note that ChatGPT’s knowledge is limited to the data available up until September 2021.

The pre-training involves using a technique called unsupervised learning, where the model learns by predicting the next word in a sentence. This process is called “language modeling” and helps the model gain a general understanding of the language.

Fine Tuning: Once an LLM is pretrained on a massive dataset, it can be fine-tuned on a more specific task or domain. This process, called transfer learning, helps the model adapt its general understanding of language to a more focused application, such as answering questions about a particular topic or generating text in a specific style.

Fine-tuning is performed using supervised learning, where the model is given input-output pairs (prompts and responses) and learns to generate appropriate responses for a given prompt. This process makes the model more controlled and safer for users, enabling it to better understand and respond to user inputs in a conversational manner.

It’s important to note that LLMs, like ChatGPT, are not capable of learning from new information in real-time during user interactions. They are trained offline, and their knowledge is fixed at the time of their last training update. However, researchers continuously work on improving these models by retraining them with updated data, making them more accurate and useful over time.

“Think of fine tuning as retraining an already refined LLM on additional information to give it expertise in a specific field or on specific data,” said Doug Sims, who works in software development for a large media company. “Due to the size of these LLMs, fine tuning is very expensive,” Sims said. GPT3, for example, has about 175 billion parameters, all of which must be updated repeatedly during each epoch of a training process.

Sims added some color to the recent Stanford Alpaca project, which made the news for fine-tuning a copy of Meta’s 7-billion parameter LLaMa model for a compute cost of about $600, which Sims points out “didn’t include the cost of the eight Nvidia A100 GPUs, which cost about $16,000 each, five PhD students, and three faculty members for two months.”

How AI Models Understand Text

At their core, computers process information as numbers. To enable seamless communication between humans and AI models, text must be converted into numerical representations.

A Shout-Out to the Math Geeks (and to those of us who didn’t think math was that important)!

Linear algebra, a branch of mathematics dealing with vectors and matrices, plays a critical role in this conversion process, and understanding its concepts can provide valuable insights into the inner workings of AI systems.

Linear algebra is essential because it forms the foundation of many advanced algorithms that empower AI systems to learn and process vast amounts of data. In the context of AI models like ChatGPT, the text is represented as high-dimensional vectors, which can be manipulated and analyzed using linear algebra techniques. These techniques help AI models capture the relationships, patterns, and structures within the data, allowing them to generate coherent and contextually relevant responses.

By understanding the role of linear algebra in the processing of text data, we can appreciate the complexity and elegance of the algorithms that enable AI models to learn from and generate human-like language. Moreover, this knowledge provides a solid foundation for further exploration into the fascinating world of machine learning and AI, especially when it comes to interacting with large language models.

These numerical representations are called vectors. (Here’s looking at you, algebra!) Here’s what you need to know about vectors …

Tokenization: Think of this as breaking down a sentence into individual words or smaller units. Tokenization helps the model digest and process the input text more effectively.

To illustrate the process, let’s take a simple example. If you ask ChatGPT about its favorite ice cream flavor, the text of your question might be, “What is your favorite ice cream flavor?” But to the LLM, this query is tokenized into individual words or subwords before it is processed. These tokens are then converted into vectors using embeddings, which allow the model to understand the meaning and relationships between the words. The model processes the input through multiple layers to create contextualized embeddings that capture the context of the question. If the question was part of a long conversation, it might be necessary to chunk the text into smaller parts and process them separately before combining the results.

Tokenizing a Sentence

Let’s look at a real example of tokenization. If we wanted an LLM to understand the sentence, “ChatGPT is an impressive AI language model,” we would first have to tokenize it. (This is what happens under the hood when you interact with ChatGPT.)

After tokenization, the sentence might look like this:

[‘Chat’, ‘G’, ‘PT’, ‘ is’, ‘ an’, ‘ impress’, ‘ive’, ‘ AI’, ‘ language’, ‘ model’, ‘.’]

The next step is to convert these tokens into integer IDs based on GPT-3’s vocabulary. Suppose the integer IDs for these tokens are as follows:

[1203, 45, 908, 9, 47, 5862, 125, 64, 2531, 2965, 3]

Now, we’ll pass these integer IDs through the embedding layer, which maps each ID to a continuous vector representation. For simplicity, let’s assume that GPT-3 uses 10-dimensional embeddings (in reality, the embeddings would have much higher dimensions, such as 768 or more):

1203 -> [-0.234, 0.687, 0.191, -0.625, 0.542, 0.982, -0.231, 0.102, -0.493, 0.014] 45 -> [0.543, -0.932, 0.678, 0.041, -0.178, 0.213, 0.965, -0.437, 0.081, -0.326] 908 -> [0.123, 0.452, -0.732, -0.518, 0.372, -0.625, 0.418, 0.153, 0.827, 0.106] … and so on for the other tokens.

Chunking: Sometimes, we need to process long sentences or paragraphs with the AI model. However, these models have a limit on the maximum length of text they can handle. To overcome this, we divide the text into smaller chunks, process them individually, and then combine the results. This helps manage the model’s memory constraints, breaking the data into pieces that can fit into the model’s context window. This allows the model to process the text sequentially, one chunk at a time. During chunking, it’s important to ensure that the division of text maintains coherence and does not break the flow of information.

My favorite Chunk (from Goonies)

Embeddings: Once we have the tokens, we need to convert them into numbers, or more specifically, vectors. These vectors, which can be imagined as points in a multi-dimensional space, represent the meaning and relationships between the tokens. It’s like teaching the AI model that “love” and “adore” are similar because their vectors are close together in this space.

Contextualized embeddings: AI models like GPT use an architecture called the Transformer to process these vectors further. By doing so, the model can understand the context in which words are used. For instance, the word “bank” may have a different meaning when used with “river” than when used with “money”. Contextualized embeddings allow the model to differentiate between such nuances.

Context Window: The context window is the maximum number of tokens (words or word pieces) that an AI model can process at once. This limitation is due to the model’s architecture, which includes a fixed number of layers and self-attention mechanisms. When the input text is within this limit, the model can process it in one pass. However, if the input text is longer than the context window, it needs to be broken down into smaller chunks.

Mitchell Troyanovsky, co-founder of Basis, an AI-powered knowledge worker system for finance and accounting, explained context windows this way:

In theory if you had unlimited resources, you could have an LLM iterate over long contexts and continually summarize and check itself to get to the optimal answer. Imagine a human pulling data, writing a memo on it, and then 1,000 other humans at that company checking and revising its work. The final draft will definitely be better than the first, but the marginal value of each successive check will diminish. One of the challenges is figuring out the processes for summarizing and checking. This is certainly relevant with embeddings and any live company data being summarize.

For example, say you want an LLM to write a memo about how a firm has changed its policies on client engagement over the last couple years. Assuming you have all company documents embedded, when you search, odds are you’re not going to get the right info immediately — there are probably lots of internal documents with information on client engagement. So you need additional processes to figure out the correct search, check the results, and continue down that path. Finding the balance from a cost/time vs. answer quality perspective is key. Obviously one of the massive unlocks will be the context length increasing, allowing models to reason coherently over longer sets of resources. Another approach that is more involved but can also be more accurate is to train models directly on important documents.

Human vs. AI Text Processing: Key Differences

The way humans process text is vastly different from how AI models like ChatGPT handle it. Human understanding of language is grounded in cognitive development, memory, emotions, and creativity. We learn language through a gradual process, absorbing meaning by interacting with other humans and with our environment.

Photo Credit: DreamStudio.ai

LLMs, conversely, acquire language through a technique called “unsupervised learning.” These models predict the next word in a sentence based on the context of previous words that it was trained on, allowing them to be exposed to numerous examples of language use, grammar, and world knowledge. This exposure helps them understand and generate human-like language.

But these models do have limitations. They lack the personal experiences, memory, emotions, and cognitive abilities that humans possess, which means they lack our rich understanding of context, figurative language, metaphors, and symbolism. These are all missing in the simple numerical representations that allow the models to recognize patterns and structures in the text. While this approach allows them to perform complex language tasks, it’s important to recognize that their understanding and processing of text differ significantly from the human experience.

Now that we have an understanding of how GPT reads and “understands” text, let’s look at some applications of this technology.

Unlocking the Potential of GPT in the Business World

By fine-tuning GPT on domain-specific data, customized AI solutions can be created for various industries:

Sims said:

LLMs like ChatGPT are useful and interesting on their own because they can answer questions about known facts, summarize documents, including computer programs, and even synthesize new text and programs from known concepts. They can even use all of the tokens within their maximum context window size, including the user’s input and their own output, in generating their output. What they cannot do on their own is access external data or modify their internal knowledge representations with new information, and they would be much more useful if they could. Instead, there are several other methods which allow them to effectively access new data:

· Using a system like Langchain to allow several LLMs to interact with each other and with external APIs. One excellent example of this is San Francisco GPT, which uses GPT3.5 to translate human-language questions into SQL queries, run them on public datasets, and return the results in text, tables, and graphics. The source code for this is on github, allowing anyone to clone and customize this for their own datasets.

· One other consideration that seems inevitable is the presentation of information. You will work on creating some new information (research, findings etc.) and then you’ll press a button and have a perfect memo/document formatted and written in your/your firm’s style. The collateral, powerpoint, etc. will already be drafted and tailored to your firm’s brand. You’ll have custom generated stock photos just for the collateral. If it requires marketing, you’ll have copy, etc. ready to go. So the amount of manual work in transforming information into polished output will go down dramatically. There are and will be MANY tools that do this. And this should all be happening in production for people at scale by 2024 I’d guess.

Other tools that are already being built on top of the foundation of GPT include:

Domain-specific knowledge bases: GPT could be trained on data related to a specific industry or domain, such as healthcare, law, or finance, to create specialized AI experts that can offer valuable insights and assistance.

Personalized recommendation systems: By capitalizing on GPT’s understanding of user preferences or browsing history, we could develop personalized recommendation engines for products, articles, movies, or music (for example).

Company-specific AI assistants: GPT could be trained on internal documents, processes, and knowledge bases to create AI assistants that can help employees with company-specific tasks, such as answering policy-related questions, assisting with onboarding, or providing guidance on internal tools and resources.

This technology is still in its nascent stage, but is moving exponentially faster than any previous technological innovation I can recall. While I’m not ready to go all in on any particular product or platform, we can all see the potential of LLMs and the tools that will be launched from these models as a foundation.

AH-64 Apache Training Video

I firmly believe that the more powerful the tool, the more responsibility the user has to understand it. While we don’t need to take a course in mechanical engineering to use a wrench, I wouldn’t want to be in the vicinity of an untrained internerd with no more training than having recently watched “Fire Birds,” starring Nicholas Cage and Tommy Lee Jones, trying to take off in an Apache helicopter.

Today’s LLMs are the Apache helicopters of digital technology.

Tomorrow’s will be Falcon rockets.

If we’re going to be using these powerful tools now and in the future, we better start learning what they are and how they work.

Otherwise, I found a deal on another productivity tool I’d like to recommend …

https://a.co/d/15DJR58

--

--