I had chatGPT tell me about its inner workings. Here’s what I learned:

18 min readFeb 5, 2023

Fact checking things in the age of AI-generated content is going to be difficult. How does an AI know what it knows? How can it discern fact from fiction?

My inquiry began, wondering how chatGPT can have any knowledge about itself — which is actually a paradox, given that chatGPT didn’t exist yet when it was trained with data.

Exploring this question had me wondering about some of the inner working of machine learning models, and AI in general.

In the process of this conversation I learned a lot more about the technical details of how AI works, what different types of AI there are, and what exactly it takes to build an AI and put together all the different components into a working system. Of course, I also asked it a little bit about general problem-solving.

If chatGPT was only trained on data available up to 2021, then how can chatGPT know what chatGPT is?

ChatGPT was indeed trained on text data up to 2021, but it’s a language model that has the ability to generate text based on the input it receives. So, even though its training data doesn’t include information about itself, it can still generate a response to a question like “What is ChatGPT?” based on patterns it learned from the text it was trained on.
The training data includes text that mentions or describes AI and language models, so ChatGPT can generate text based on that information and answer questions about itself. However, it’s important to note that the information ChatGPT generates may not always be accurate or up-to-date, as it is limited by the text data it was trained on.

Next, I asked:

Are you chatGPT?

Yes, I am an instance of the ChatGPT language model developed by OpenAI.

If chatGPT was trained only on data available up to 2021, then how do you know it’s true that you are chatGPT and that you were only trained on data available up to 2021?

I, as an AI language model developed by OpenAI, do not have personal experiences or memories. I generate responses based on patterns in the text data I was trained on, and my creators have provided me with the information that I was trained on data up to 2021.
However, it’s important to understand that while I can provide information and generate text based on the data I was trained on, I do not have the ability to independently verify that information. The accuracy of the information I provide is dependent on the quality and accuracy of the data I was trained on.

I decided to dig a little deeper and let chatGPT educate me on AI and machine lerning technologies.

What are the key innovations that have lead to the recent advances in AI?

There have been several key innovations that have driven the recent advances in AI. Some of the most important include:
Deep Learning: One of the most significant innovations in AI is the development of deep learning, which involves training multi-layer artificial neural networks on large amounts of data. Deep learning algorithms have been used to achieve state-of-the-art results in a wide range of tasks, including computer vision, natural language processing, and game playing.
Large-Scale Data and Computing Power: The availability of large-scale data and the increased availability of computing power have been critical in enabling the development of deep learning algorithms and other advanced AI models.
Transfer Learning: Transfer learning is a technique that allows pre-trained models to be fine-tuned for specific tasks using a small amount of labeled data. This has made it possible to train AI models on a wide range of tasks with limited amounts of data, which has been a major driver of the recent advances in AI.
Reinforcement Learning: Reinforcement learning is a type of machine learning that involves training agents to make decisions in an environment by learning from rewards and penalties. This has been an important innovation in AI, as it has enabled the development of systems that can learn to perform complex tasks, such as playing video games and controlling robots.
Generative Models: Generative models, such as Generative Adversarial Networks (GANs), have been a key innovation in AI. These models can generate new data samples that are similar to a training set, which has been used for tasks such as data augmentation, image synthesis, and anomaly detection.
These are just a few of the key innovations that have driven the recent advances in AI. The field is constantly evolving, and new innovations are likely to continue to drive further progress in the future.

Then I thought, let’s find out what exactly is a “Large Language Model” and what does it mean for it to have so many parameters!

To clarify, I wanted to see if I’m even on the same page with the chatbot’s knowledge:

Is it true that the parameters that make a language model are stored on an array of computer memory?

Yes, that’s correct. The parameters of a language model, such as ChatGPT, are stored in arrays of computer memory. These parameters are the result of the model’s training process, and they represent patterns in the training data that the model learned.
During inference, the model uses these parameters to generate responses based on the input it receives. The size of the arrays and the number of parameters can vary depending on the complexity of the model and the amount of training data it was trained on. However, in general, language models like ChatGPT require large amounts of computer memory to store their parameters and perform their computations.

So, let’s go on talk about how billions of parameters can have actual knowledge:

Is the accuracy of the information that large language models provide emergent from the text that they generate, or intrinsic to their memorized paremeters?

The accuracy of the information provided by large language models like ChatGPT is a combination of both emergent properties from the text they generate and the intrinsic quality of the memorized parameters.
The text generated by the model is based on the patterns in the training data it was trained on, and the accuracy of the generated text depends on the quality and accuracy of that training data. For example, if the training data contains inaccurate or misleading information, the model may generate text based on that information, which could result in inaccurate or misleading responses.
At the same time, the parameters of the model, which are the result of the training process,represent the relationships and patterns in the training data that the model has learned. The quality of these parameters, including their accuracy and ability to generalize, can affect the accuracy of the information the model provides.
So, while the accuracy of the information provided by a language model like ChatGPT is largely dependent on the quality of the training data it was trained on, the model’s parameters play an important role in shaping the text it generates and determining the overall accuracy of the information it provides.

The training process of a language model is what creates it’s parameters. What is the name for the process that puts the language model to use and produces useful output?

The process of using a language model to generate useful output is commonly referred to as “inference” or “prediction”. During inference, the language model takes input in the form of text or other data and uses its parameters to generate an output.
For example, in the case of a conversational AI like ChatGPT, the model would receive a prompt or question as input and use its parameters to generate a response. This process of generating text based on input is done in real-time, allowing the model to provide a dynamic and interactive experience for the user.
In short, inference is the process of using a language model to generate useful output, such as text or other data, based on input.

A language model seems to be comprised of two things: 1. The instructions for how to train the model and 2. The actual parameters generated from the training. Which of the two is someone referring to when they say “language model”? The training instructions or the generated parameters?

When someone says “language model,” they are typically referring to the generated parameters of a model, which are the result of the training process.
The training process of a language model involves learning patterns and relationships in large amounts of text data, and the generated parameters represent the distilled knowledge that the model has acquired from this training data. These parameters are what enable the model to generate text or perform other language-related tasks during inference.
The instructions for how to train a language model, such as the algorithms and techniques used, are often considered part of the underlying architecture of the model, rather than the model itself. However, both the architecture and the generated parameters are important components of a language model, as the architecture determines how the model is trained and the parameters represent the knowledge that the model has acquired from the training data.

Okay, enough with LMMs, what about the other AI technologies that exist?

What are other names and examples of AI or ML systems that generate content? Are they all based on the “generative pre-trained transformer” or are there others?

There are several types of AI and ML systems that generate content, and not all of them are based on the “generative pre-trained transformer” architecture. Some examples include:
1. Generative Adversarial Networks (GANs): These are a type of neural network that involves training two separate models: a generator and a discriminator. The generator produces synthetic content, while the discriminator tries to distinguish between the synthetic content and real content. Over time, the two models improve, and the generator is able to produce more realistic content.
2. Variational Autoencoders (VAEs): These are a type of deep generative model that can be used to generate new data that is similar to a set of training data. They do this by encoding the training data into a compact representation, and then using this representation to generate new data.
3. Rule-based systems: These are systems that use sets of rules to generate content, such as filling in templates or replacing placeholders in a predefined text structure. They can be simple to implement and are often used for generating structured text, such as product descriptions or form letters.
4. Markov Chain models: These are probabilistic models that generate content based on the probability of transitioning from one state to another. They can be used to generate text, music, or other types of content, and are often used for text completion or generation.
5. These are just a few examples of the many types of AI and ML systems that can be used to generate content. Each has its own strengths and weaknesses, and the choice of which to use will depend on the specific requirements of the task and the data being used.

How does the generative pre-trained transformer (GPT) work, compared to those others?

The Generative Pre-Trained Transformer (GPT) is a type of deep neural network architecture that has been very successful in the field of natural language processing. It is different from other systems that generate content in several ways:
- Pre-training: GPT models are pre-trained on large amounts of text data, allowing them to learn a wide range of patterns and relationships in language. This pre-training allows the models to generate high-quality text right out of the box, without the need for further fine-tuning.
- Transformer architecture: GPT models use a transformer architecture, which is specifically designed to handle sequences of data such as text. The transformer architecture allows the models to effectively capture long-range dependencies and relationships in the data, which is important for understanding the meaning of text.
- Generative: GPT models are generative, meaning that they can generate new text that is similar to the text they were trained on. This is achieved by sampling from the distribution of words learned during training.
- Fine-tuning: GPT models can also be fine-tuned for specific tasks, such as answering questions or generating specific types of text. This fine-tuning allows the models to learn task-specific patterns and relationships in the data, and improve their performance on the task.
Compared to other systems that generate content, such as GANs or VAEs, GPT models are specifically designed to handle sequences of data, such as text, and they have been very successful in the field of natural language processing. However, like any AI system, they are not a silver bullet and may not be the best choice for all tasks or types of data. The choice of which system to use will depend on the specific requirements of the task and the data being used.

GPT uses inference to answer requests from users by completing a prompt. Approximately how many percent of the parameters stored in computer memory are being utilized to generate a response from GPT during each prompt completion?

The exact percentage of parameters used by GPT during each prompt completion depends on several factors, such as the size of the model, the length of the prompt, and the computational resources available. It’s difficult to provide a precise number without knowing the specific details of the model and the prompt.
In general, GPT models are very large and have millions or even billions of parameters. However, not all of these parameters are used in every prompt completion. When GPT generates a response to a prompt, it uses a subset of its parameters to perform the calculation, and this subset may change with each prompt.
The goal of the model during inference is to use the minimum number of parameters necessary to generate a response that is accurate and meets the user’s requirements, while also being efficient in terms of computational resources. Therefore, it’s possible that a significant proportion of the parameters are unused during a single prompt completion, but are still important for the overall performance of the model on a range of tasks and prompts.

How many copies of the GPT model are running on openAIs servers simultaneously to fulfill user demand?

I am a language model created by OpenAI and I do not have access to the current information on the number of GPT models running on OpenAI’s servers. This information is proprietary and not publicly available. However, OpenAI has several server clusters with multiple GPUs and CPUs that can be used to run multiple instances of GPT or other AI models in parallel to fulfill user demand. The number of instances that are run simultaneously will depend on factors such as the current demand for the model, the computational resources available, and the need to balance resources to ensure efficient and stable operation of the servers.

How many parameters does GPT3 have and how much storage space does each parameter take up?

GPT-3 is a large language model developed by OpenAI, and it has 175 billion parameters. The exact storage space required for each parameter depends on the data type and encoding used to store it, but in general, a single floating-point number requires 4 bytes of storage. Therefore, a rough estimate of the storage space required for GPT-3’s parameters would be 175 billion parameters * 4 bytes/parameter = 700 GB. However, this is just a rough estimate and the actual storage requirements may be different based on the specific implementation and optimization techniques used. Additionally, the size of the model can be reduced through techniques such as quantization and pruning, which can lower the storage requirements while still preserving its accuracy.

When compared to a human brain, are these parameters more like neurons or more like synapses?

When comparing a language model like GPT-3 to the human brain, the parameters can be thought of as being more similar to the connections between neurons in the brain, known as synapses, than to the neurons themselves.
The parameters in a language model determine the relationships between different pieces of input and output data, and they can be thought of as encoding knowledge or information that the model has learned from its training data. In the same way, synapses in the brain are the connections between neurons that encode the strengths of the connections and allow the neurons to communicate with each other and process information.
However, it’s important to note that the comparison between a language model and the human brain is a rough analogy, and the actual workings of the brain are much more complex and intricate than can be captured by a machine learning model like GPT-3. The human brain has many different types of neurons and synapses, and they work together in sophisticated ways that are not yet fully understood, whereas a language model like GPT-3 is a simplified representation of a much more complex system.

Are any of the parameters in GPT3 actual words of the English language?

No, the parameters in GPT-3 are not actual words of the English language. Instead, they are values that the model uses to make predictions about the likelihood of a particular word or sequence of words occurring in a given context.
In a language model like GPT-3, the parameters are used to generate probabilities for each word in the vocabulary, given the previous words in a sentence or prompt. The parameters represent relationships between words and the context in which they appear, and they are learned from the training data during the model’s training process. The final parameters of the model are numerical values, not actual words, and they are used to generate text by selecting the word with the highest probability for each step in the generation process.

So, separate from that, there must be a database of words that the model can access and refer to, or not? Or does the use of words become intrinsic and emergent from the parameters?

Yes, separate from the parameters of a language model like GPT-3, there is a database of words that the model can access and refer to. This database is called the vocabulary, and it is a list of all the unique words that the model was trained on. The vocabulary is an important part of a language model, as it determines the set of possible words that the model can generate and the relationships between those words.
The use of words in a language model like GPT-3 does not become intrinsic and emergent from the parameters. Instead, the parameters are used to make predictions about the likelihood of each word in the vocabulary given the context of the input. The words themselves are part of the model’s vocabulary, which is separate from the parameters. The vocabulary and the parameters together form the complete language model, which can generate text by selecting words from the vocabulary based on the parameters’ predictions.

How many items are in the vocabulary?

The size of the vocabulary in a language model like GPT-3 depends on the specific model and the training data used. Typically, the vocabulary size is in the range of tens of thousands to several hundred thousand words.
In GPT-3, the vocabulary size is estimated to be around 50,000 words, but this is only a rough estimate and the exact size may vary depending on the specific implementation and training data used. Having a large vocabulary allows the model to represent a wide range of words and phrases, but it also requires more memory and computation to store and use the parameters for each word in the vocabulary.

Tell me about rule based AI and ML systems again

Rule-based AI and ML systems are a type of artificial intelligence that operate based on a set of predefined rules. Unlike more advanced AI systems such as machine learning models, rule-based systems don’t learn from data and make predictions based on patterns in the data. Instead, they use a set of hard-coded rules that dictate how they should respond to inputs.
For example, a simple rule-based system might be designed to respond to questions about a particular topic by returning predefined answers based on keywords in the question. For example, if the system was designed to answer questions about weather, it might respond to the question “What’s the weather like in New York?” by returning the current temperature and conditions for New York.
Rule-based systems have some advantages over more advanced AI systems. They can be relatively simple to build and understand, and they can be highly reliable if the rules are well-designed and accurate. However, they also have some disadvantages. They are limited in their ability to adapt to new situations and respond to unexpected inputs, and they can be less effective than machine learning models at handling complex and nuanced inputs.
Overall, rule-based systems are useful in specific, well-defined situations where a clear set of rules can be used to generate accurate and appropriate responses, but they are less suitable for more complex and dynamic problems where more advanced AI techniques are needed.

Aren’t the algorithms used to train and create AI models rule based systems?

The algorithms used to train AI models can be seen as a type of rule-based system, in the sense that they operate based on a set of rules or mathematical equations that dictate how the model should process and use data. However, the term “rule-based system” is typically used to refer to AI systems that operate based on a set of explicitly defined rules that dictate how the system should behave in a specific context, without the need for training.
For example, a rule-based system for diagnosing medical conditions might use a set of predefined rules that map symptoms to diagnoses based on expert knowledge. This system would not require any training data and would simply apply the predefined rules to the input data to make a diagnosis.
In contrast, machine learning models like neural networks are trained on large amounts of data to learn patterns in the data, and then use these patterns to make predictions on new, unseen data. The algorithms used to train these models can be seen as a type of rule-based system, but the models themselves are not typically referred to as rule-based systems because they don’t operate based on explicit rules.
So while the algorithms used to train AI models can be seen as a type of rule-based system, the models themselves are typically not referred to as rule-based systems because they are trained on data to learn patterns and relationships in the data, rather than operating based on predefined rules.

Are all machine learning models based on deep neural networks?

No, not all machine learning models are based on deep neural networks. While deep neural networks (DNNs) have been very successful in recent years and are used in many applications, there are other types of machine learning models that are also widely used.
Some other popular types of machine learning models include:
Decision Trees
Random Forests
Support Vector Machines (SVMs)
K-Nearest Neighbors (KNNs)
Naive Bayes
These models work in different ways and are used for different purposes. For example, decision trees are often used for classification problems, while support vector machines are commonly used for regression problems.
Each type of machine learning model has its own strengths and weaknesses, and the choice of which model to use depends on the specific problem being solved, the available data, and other constraints. So while deep neural networks have become very popular, they are not the only type of machine learning model and there are many other models that can be used in different contexts.

Deep Neural Networks are not the only form of AI

Do you know of any general purpose problem solving models?

Yes, there are a few machine learning models that are considered to be “general Purpose Problem Solvers.” One such model is the Universal Approximator, which is a type of neural network that is capable of approximating any continuous function to an arbitrary accuracy. This means that it can be trained to perform a wide range of tasks and problems, making it a very flexible and versatile model.
Another example is the Reinforcement Learning (RL) framework, which is a type of machine learning that is used to solve problems by having an agent interact with an environment to learn how to make decisions that maximize a reward signal. RL is a general-purpose framework that can be applied to a wide range of problems, including game playing, robotics, and autonomous navigation.
Finally, Generative Adversarial Networks (GANs) are another example of a general-purpose problem-solving model. GANs are a type of neural network that can be used to generate new data samples that are similar to a training set. This makes them a versatile model that can be used for tasks such as image synthesis, data augmentation, and anomaly detection.
So, while there is no single machine learning model that can solve every problem, there are several models that are considered to be “general Purpose Problem Solvers” due to their versatility and flexibility.

Could you please describe, in simple terms, what the difference is between chatGPT and a AI that’s trained on guiding users through a particular problem solving process?

Sure, I’ll try to explain it in a simple way.
ChatGPT is a language model developed by OpenAI that’s trained to generate human-like text based on the input it receives. It’s trained on a large corpus of text data, which allows it to respond to a wide range of questions and generate text that’s similar to human writing.
An AI trained to guide users through a problem-solving process, on the other hand, is focused on a specific task or set of tasks. For example, it might be trained to help users solve math problems, diagnose a medical issue, or troubleshoot a technical issue. These AI systems are designed to provide specific, step-by-step guidance to help users reach a solution to their problem.
In short, ChatGPT is a general-purpose language model that can respond to a wide range of questions, while an AI trained for problem-solving is focused on a specific task and provides specific guidance to help users solve a particular problem.