What Is GPT-3 And Why It is Revolutionizing Artificial Intelligence?

RAJAT PANCHOTIA
Analytics Vidhya
Published in
6 min readFeb 16, 2021

--

Generative Pre-trained Transformer 3 (GPT-3) is an auto-regressive language model that uses deep learning to produce human-like text. It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San Francisco-based artificial intelligence

GPT 3 is the most powerful language model ever built.

GPT 3 can write poetry, translate text, chat convincingly, and answer abstract questions. It’s being used to code, design, and much more.

The model has 175 billion parameters to put that figure into perspective it's previous model GPT 2 which was considered state of the art and shockingly massive when it was released last year had 1.5 billion parameters which were soon eclipsed by Nvidia's Megatron with 8 billion parameters followed by Microsoft's turing energy that had 17 billion parameters now open ai turns the table by releasing a model that is 10 times larger than turing energy gpt3 is largely being recognized for its language capabilities when properly primed by a human it can write creative fiction.

Researchers say that GPT3 sample is not just close to a human level in fact they are creative witty deep meta and often beautiful they demonstrate an ability to handle abstractions like style parodies write poems etc they also said that chatting with the gpt3 feels very similar to chatting with a human it can also generate functioning code in the latest news.

GPT3 creating a simple react application:

Here the developer describes the React application that they want and the AI writes a function with the hooks and events needed to function correctly.

WHAT CAN GPT-3 DO?

Starting with the very basics, GPT-3 stands for Generative Pre-trained Transformer 3 — it’s the third version of the tool to be released.

This means that it generates text using algorithms that are pre-trained — they’ve already been fed all of the data they need to carry out their task. Specifically, they’ve been fed around 570GB of text information gathered by crawling the internet (a publicly available dataset known as CommonCrawl) along with other texts selected by OpenAI, including the text of Wikipedia.

WHAT’S SO SPECIAL ABOUT GPT-3?

The GPT-3 model can generate texts of up to 50,000 characters, with no supervision. It can even generate creative Shakespearean-style fiction stories in addition to fact-based writing. This is the first time that a neural network model has been able to generate texts at an acceptable quality that makes it difficult, if not impossible, for a typical person to whether the output was written by a human or GPT-3.

HOW DOES GPT-3 WORK?

GPT-3 is an example of what’s known as a language model, which is a particular kind of statistical program. In this case, it was created as a neural network.

The name GPT-3 is an acronym that stands for “generative pre-training,” of which this is the third version so far. It’s generative because unlike other neural networks that spit out a numeric score or a yes or no answer, GPT-3 can generate long sequences of the original text as its output. It is pre-trained in the sense that has not been built with any domain knowledge, even though it can complete domain-specific tasks, such as foreign-language translation.

A few examples:

noun + verb = subject + verb

noun + verb + adjective = subject + verb + adjective

verb + noun = subject + verb

noun + verb + noun = subject + verb + noun

noun + noun = subject + noun

noun + verb + noun + noun = subject + verb + noun + noun

WHAT’S SO SPECIAL ABOUT GPT-3?

The GPT-3 model can generate texts of up to 50,000 characters, with no supervision. It can even generate creative Shakespearean-style fiction stories in addition to fact-based writing. This is the first time that a neural network model has been able to generate texts at an acceptable quality that makes it difficult, if not impossible, for a typical person to whether the output was written by a human or GPT-3.

At the highest level, training the GPT-3 neural network consists of two steps.

The first step requires creating the vocabulary, the different categories and the production rules. This is done by feeding GPT-3 with books. For each word, the model must predict the category to which the word belongs, and then, a production rule must be created.

The second step consists of creating a vocabulary and production rules for each category. This is done by feeding the model with sentences. For each sentence, the model must predict the category to which each word belongs, and then, a production rule must be created.

The result of the training is vocabulary and production rules for each category.

The model also has a few tricks that allow it to improve its ability to generate texts. For example, it is able to guess the beginning of a word by observing the context of the word. It can also predict the next word by looking at the last word of a sentence. It is also able to predict the length of a sentence.

While those two steps and the related tricks may sound simple in theory, in practice they require massive amounts of computation. Training 175 billion parameters in mid-2020 cost in the ballpark of $4.6 million dollars, although some other estimates calculated it could take up to $12 million depending on how the hardware was provisioned.

BACKGROUND :

GTP3 comes from a company called OpenAI. OpenAI was founded by Elon Musk and Sam Altman (former president of Y-combinator the startup accelerator). OpenAI was founded with over a Billion invested to collaborate and create human-level AI for the benefit of the human race.

OpenAI has been developing its technology for a number of years. One of the early papers published was on Generative Pre-Training. The idea behind generative pre-training is that while most AI’s are trained on labelled data, there’s a ton of data that isn’t labelled. If you can evaluate the words and use them to train and tune the AI it can start to create predictions of future text on the unlabeled data. You repeat the process until predictions start to converge. (source:https://gregraiz.com/gpt-3-demo-and-explanation/)

The original GPT stands for Generative Pre Training and the original GPT used 7000 books as the basis of training. The new GPT3 is trained on a lot more… In fact it’s trained on 410 billion tokens from crawling the Internet. 67 Billion from books. 3 Billion from Wikipedia and much more. In total it’s 175 Billion parameters and 570GB of filtered text (over 45 Terrabytes of unfiltered text)

Over an ExaFLOP day of compute needed to train the full data set.

The amount of computing power that was used to pre-train the model is astounding. It’s over an exaflop day of computer power. One second of exaflop computer power would allow you to run a calculation per second for over 37 Trillion years.

The GPT3 technology is currently in limited Beta and early access developers are just starting to produce demonstrations of the technology. As the limited Beta expands you can expect to see a lot more interesting and deep applications of the technology. I believe it’ll shape the future of the Internet and how we use software and technology. (Source:https://gregraiz.com/gpt-3-demo-and-explanation/)

LINKS :

--

--

RAJAT PANCHOTIA
Analytics Vidhya

Fascinated by computer science and want to be part of an exciting and continually developing industry. An enthusiast machine learner.