Learning about GPT, from GPT

3 min readDec 11, 2023

I decided to learn in detail about the workings of ChatGPT. While I was searching the relevant research papers, and trying to figure out how they are interrelated, what came first etc., it dawned upon me — the subject at hand knows how to talk:) So the following post is based on my understanding from the conversation I had with ChatGPT, so it’s accuracy is based on how self-aware it is, which, I mean, even most humans truly aren’t at this point :) So please take this with a grain of salt.

My first question was — Is there just one single large deep neural network underneath the hood, or a collection of models which are chosen from based on the prompt. Turns out that the latest GPT-3.5, is based on a single model. I had this question as in the first paper where GPT was introduced — ‘Improving Language Understanding by Generative Pre-Training’, the language model obtained using trained GPT was used for fine tuning for specific supervised NLP tasks like question answer, similarity etc. and these models were eventually used for analysis. GPT, for the uninitiated, stand for Generative Pre-Trained Transformer. The story of Large Language Models begins with this seminal paper in deep learning which pretty much changed the landscape, titled ‘Attention is all you need’. This paper introduces a novel deep neural network architecture called ‘Transformer’ that surpassed benchmarks set by all previous models for various NLP tasks. Generative stands for the fact that it is a language model, i.e, given a sequence of words it is trying to predict what the next word should be. That is all GPT does frankly, but it does it so well that it seems to be having a conversation. Pre-Trained obviously means that it has been trained on a very large dataset, and can be fine tuned for specific tasks/datasets.

I then asked the difference between GPT-1, GPT-2, etc. and it seemed to suggest that the only major difference is the dataset for training grew larger, and the number of parameters increased but the basic architecture is still based off of a transformer. It seemed to suggest that this is bit of a simplification though. It refused to go into details of how it obtained the data as it is proprietary. The models after GPT-2, are not open-sourced, and just exposed as an API for inference. GPT-2, however is open-source and the code and model weights are available for research purposes.

I then decided to make the conversation a bit fun and delve deeper into it’s capabilities. I asked if given a math problem, does it actually ‘think’ and arrive at the answer logically, or if it is based on seeing some similar pattern in the training data, and not applying any logical reasoning. The response was, and I quote,

I don’t “think” or “understand” in the way humans do. My responses are generated based on patterns and information present in the data on which I was trained. When you present a math problem, I don’t perform mathematical reasoning in the traditional sense; instead, I generate responses by predicting what comes next in a sequence, drawing on patterns learned during training. — — ChatGPT, 2023

I asked is it not possible to in-corporate logical reasoning too in the responses, at least verifying them for correctness, after all some early AI systems were purely symbolic based on logic. This was the response:

While I don’t have the capability for true logical reasoning or understanding, developers can implement certain rule-based systems or logic within specific applications that use my responses. However, this is typically done externally, in the application code, rather than being an intrinsic part of my architecture. — — — ChatGPT, 2023

My further conversations were based on biases in dataset, how does it handle them, it’s thoughts on how it handles them, about political correctness etc. but in the interest of not making this post too long, I’ll save it for another time. Thanks for your time!

Learning about GPT, from GPT

Written by Surya Dwivedi