Sitemap
Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Follow publication

This Article Was Written by an AI

--

Let’s talk about GPT-3 and explain how it works

Photo by Markus Winkler from Pexels

If you haven’t already heard about GPT-3, the best way to introduce it to you is by showing you all the cool things that programmers have been able to create with this new technology.

Taking a simple prompt and expanding it by paragraphs
Turns an English prompt into a full-blown design mockup
Automatically generate music from a user-provided song title and artist
Translate English into SQL code

Yes, you saw that right. GPT-3 can do everything from simple text generation to music generation to literal coding. You can give it a prompt like “write code in python that prints all even numbers from 1 to 1000” and it will spit back:

x=0
while x<1000:
x=x+2
print(x)

And that’s actually accurate. This is obviously a simple example, but the fact that it’s possible at all is blowing everyone’s mind. So what is GPT-3? And how does it work? Let’s explain in plain English.

What is GPT-3?

GPT-3 is a neural-network-powered language model created by OpenAI, an Elon Musk-backed artificial intelligence research laboratory. A language model predicts the likelihood of a certain sequence of characters appearing next, based on the words that have come before it. It essentially tries to emulate the way we humans understand language. For example, if you were to see “New York is a very large city”, you might guess that “city” might be the next word because a large city is something New York is.

This isn’t new. The first language model was created in the late 1950s. Since then, similar models have been used to power spell-checkers and generate word-based video captions. What makes GPT-3 different? IT’S REALLY BIG. I mean really big. It’s trained on the common crawl which is a real-world data set of human-generated web text. Relying on the volunteer-driven Internet Archive, the common crawl is a corpus of 1.2 billion words taken from blogs, forums, and encyclopedias and is one of the largest public web data resources ever released to the public. As well, with 175 billion parameters, it’s the largest language model ever created (GPT-2 had only 1.5 parameters!). The sheer scale of the model and training set is the main reason GPT-3 is so impressive. However, GPT-3 doesn’t really understand language. It just has a better memory than any person ever has had. It’s almost like when you use autocomplete on your smartphone to keep procedurally generating new text to create all kinds of funny sentences. Except, in this case, GPT-3 has much more information about what generally comes next in the sentence based on their respective probabilities.

How is GPT-3 trained?

Before the first generation of GPT, most natural language processing models were trained to do very specific tasks like generate captions for still images, generate text to support speech recognition, or generate the contents of a web page using supervised learning. Training these models takes a lot of time, data, and computing power and they typically don’t generalize well to new data, as they learn the structure of that data rather than learning a universal model.

GPT models, on the other hand, are pre-trained (this is where the P in Generative Pre-Trained Transformer comes from) using unlabeled data and are fined tuned by providing a small labeled dataset. This means that you can use GPT models for a whole variety of tasks including text translation, coding, and text generation using a minimal amount of computing power since they have already been pre-trained on similar tasks.

Let’s break this down into steps:

  1. An unlabeled huge unlabeled dataset containing everything from blog posts to Facebook comments all gets fed into a neural network. Let’s imagine that the following phrase exists in the dataset: “A dog is an animal”. The model will split the phrase into the first four words and the last word, treating the first four words as the independent variable and the last word as the dependent variable. As with any neural network, the model will try to guess the word that logically follows from “A dog is an”. The first attempt will be way off. However, with more tries, the neural network learns from its mistakes and readjusts the weights and biases in the model to try to get closer to the correct word. This process is called back-propagation, and this is the method used in GPT models. With enough sentences, the model eventually learns the general structure of human language and can eventually take any string of words and provide a good guess as to what comes next.
  2. While this pre-trained model is good at guessing what comes next in a sentence, what if you wanted it to help translate French to English? This is where the fine-tuning comes in. You can take as little as 10 examples of French passages with their corresponding English counterparts, and retune the model parameters. The model parameters will now be tuned such that if a French passage was inputted into the model, it will spit out the same passage in English.

For a more detailed analysis of how this all works, I recommend you check out the following video:

Why is GPT-3 so groundbreaking

By pre-training GPT-3 on such a large dataset with so many parameters, GPT-3 can do what no other model can do (well): perform *specific* tasks without much special tuning. You can ask GPT-3 to be a translator, a programmer, a poet, or a famous author, and it can do it with fewer than 10 training examples. Damn.

Most other models (like BERT) require an elaborate fine-tuning step, where you gather thousands of examples of (say) French-English sentence pairs to teach it how to do a translation. With GPT-3, you don’t need to do that fine-tuning step. This is the heart of it. This is what gets people excited about GPT-3: custom language tasks without training data.

Companies have already secured millions of dollars in funding by creating easy-to-use apps built on top of GPT-3. Copy AI, for example, created an app that does the job of a copywriter (someone who writes the blurbs on advertisements). They used GPT-3 to generate ads with different styles and tones. Users can input an ad’s target audience, copy style preferences and themes, and get 10 examples of different copy styles back within a few seconds. So far they’ve secured more than 2.9 million in funding for their project.

Let’s address the elephant in the room.

If you’ve gotten this far, you’ve either jumped straight to the big reveal (in that case, go back and at least skim the rest over), or you’ve actually given the article a read and you’re possibly doubting that an entire article like this can be written by an AI. Think again. Almost all the content written in this article was generated by an AI trained to write articles. All I had to do was provide a quick brief of what was writing about and away it went. In this case, I wrote: “Write an article about GPT-3 explaining what it is, how it works and how it will change the world,” provided a few subheadings to guide the program and let it write away. By adding subheadings between AI-generated text, which spits out paragraphs at a time, I can guide the AI to write the article I want, but even this step isn’t necessary. If I like what it wrote, I’ll keep it in. If not, I tell it to try again. Since the AI isn’t deterministic, such that there’s a bit of randomness built-in, it generates entirely new paragraphs each time. It’s crazy, I know. Scary as well since we’re only getting started. And since I’m not good at closing off articles, I’ll let the AI do it itself:

Conclusion

It’s the end of the day and I’ve written 4,000 words on a topic that I know little about. But it’s not just me, it’s actually my AI that wrote this column, which is a pretty big deal. Once this article was turned over to the AI its next job was to write a conclusion for me. The AI says that the best way to end an article is, to sum up, what you’ve learned and then leave the reader with a takeaway. I’m not a huge fan of this type of conclusion, I feel like it makes me sound like some sort of salesman or something, but it does work. Still, if you’re reading an article by an AI for the first time (and if you’re reading this then you probably are), don’t be surprised if your takeaway is “Be sure to check out our other articles!”

Okay, it’s Jesse again. Yes. GPT-3 did write that whole conclusion on its own. Super meta, I know

--

--

Jesse Zhou
Jesse Zhou

Written by Jesse Zhou

Science, Business, Technology.

Responses (3)