What’s the deal with GPT-3?

Jake Tauscher
5 min readJul 23, 2020

--

Over the last month and a half, you have probably heard the tech world abuzz about OpenAI’s GPT-3. “This changes everything” has been a phrase thrown about on tech Twitter. But what is GPT-3? How does it work? And why is everyone so excited?

What is GPT-3?

GPT-3 is the latest model released by OpenAI, a leading AI development company with “a mission to ensure that artificial general intelligence benefits all of humanity.”

GPT stands for ‘Generative Pretrained Transformer’, and it is a text-generating neural network. As the name suggests, it is the third model of this type released by OpenAI. These models are based on the premise that certain words appear together more often. So, when given a ‘seed’ of language, like “On a dark and stormy night…” the model can then continue to generate text. The twist with GPT-3 is not really a twist — the model is just really powerful, and so can generate extremely realistic text.

So how does the model work?

How does a neural network predict text? Well, it predicts probabilities of the next word appearing in the sentence, based on the previous words. So, basically, the model would take as input all words that came previously in the text, then calculate a probability that the next word is each word in its lexicon (.01% chance it is A, .0001% chance it is AARDVARK, etc.). Then, the model can generate text by randomly picking the next word based on these probabilities!

So, how does the model learn these probabilities? Well, it needs to train on a ton of written data. The OpenAI model trained on 499 billion tokens from the internet (a token is a word or a number). The data comes mostly from Common Crawl, a non-profit that scrapes the web monthly and downloads content from billions of websites, then puts it in a format that is easy for data analysis. So, OpenAI had a ton of text data from the internet. Then, to train the neural network, you feed in a string of text as the input (independent variable), and the next word as the output (dependent variable).

The amount of processing power to train this model was incredible — the super computer that OpenAI trained on has 285K CPU cores and 10K GPU cores. Just to give a sense of how out of reach this computing power is for a normal person — each of the GPU cores OpenAI used costs ~$10,000 retail. That means the cost of this computer, at retail, would be ~$200M (just for the CPUs + GPUs).

So, what was the result of this training? A truly massive model. The first model of this type, GPT-1, had 110 million parameters, and GPT-2 had 1.5 billion parameters. GPT-3 has a whopping 175 billion parameters.

Why is everyone so excited?

The thing the model is best at is generating original text — basically, writing articles. Which makes sense — this is what it was trained on! And, studies by OpenAI showed it does very well at this. In a test run by OpenAI, people were able to guess that 200-word articles were written by GPT-3 (versus a human) with 52% accuracy — which shows that they were basically randomly guessing.

However, this is not what excites people the most! This model is so large, and so adept, it can be used as the basis for training other applications. This is a common approach in applications which require a ton of training data, like language or images. You take the generalizable functionality from a pre-trained model, then teach it your specific needs.

So, for example, with access to the GPT-3 API, you could give the model a few examples of translating text from Spanish to French, then give it Spanish text, and it will be able to translate it. To train this model from scratch would require a lot of original data, but the GPT-3 already has the functionality. You just need to guide it on what you want.

Even more useful, you can use GPT-3 to write code. Given the proliferation of open-source code on the internet, among the data that GPT-3 ingested is a ton of code. So, GPT-3 can be trained to write code based on a natural language input (e.g. “I want to write a Python function that takes a list of words and returns the longest”). This has huge implications for the accessibility of certain tasks that previously required extensive coding experience!

But, the model is not perfect.

The model has not been successful at everything it tries. For example, although it aced a test of 2-digit addition, it scored ~10% accuracy on 5-digit addition. This is interesting! On one hand, this is something that could probably be solved by providing the model with more examples of 5-digit addition — it seems it didn’t see this enough on the internet to get good at it. On the other hand, even though we are not impressed by 10% accuracy, it is doing a lot better than a random guess — it seems the model is able to infer some generalities about math even without the specific training data.

One other thing to watch, for those looking to use GPT-3 to get out of homework — the model can be prone to plagiarism, as it does not know it can’t reuse the verbiage it knows from its training data. For example, one user experimenting with the interface asked for a poem by Shakespeare, and got something that sounds a lot like Shakespeare. However, the first two lines were directly pulled from a poem by another English poet, Alexander Pope. Randomness can often give the impression of originality, but it is important to remember how the model is actually working!

I have seen some negative feedback from CS people on GPT-3. What is that about?

Within the NLP community, there has been some thought that GPT-3 is a bit disappointing. GPT-2 was groundbreaking, as this sort of original, generalizable generative text model had not been seen before. Some corners of the internet have been disappointed that this is essentially just a bigger GPT-2. Basically, it is a feat of engineering, not theoretical computer science. And, given how fast computing power grows, just making a bigger model may not be seen as a breakthrough 5–10 years down the road.

However, that should not diminish the accomplishment. This is an insanely huge model, that does very cool things. Just remember, as GPT-3 becomes more broadly available, don’t trust everything you read!

And, if you want — you can read the original paper yourself! https://arxiv.org/pdf/2005.14165.pdf. It has a lot more discussion on the performance of model on various tests, like reading comprehension, reasoning, and trivia.

--

--