Transforming your thoughts on OpenAI’s GPT3 Algorithm.

A Quick Introduction.

OpenAI’s GPT algorithms have been innovating the domain of Natural Language processing. Since then, a series of GPT algorithms have been released, more commonly called “GPT-n”. In 2020 a group of engineers and researchers at OpenAI described this as the third generation “state of the art language model”. And rightfully so, the overcoming of GPT-2 essentially comes down to the size of the models. GPT-2 contained 1.5 billion parameters compared to GPT-3’s 175 billion parameters. It's a really, really big model.

Technical Overview

GPT3 has given computers the ability, to an extent, to understand and generate human languages. This gives us the ability to have a computer receive and respond to emails, create chatbots for customers on websites, and code by itself.

Breaking it down into small chunks we can understand what exactly it does. On a very high level, GPT-3 is a language model that predicts the next token, or word, given a sequence of tokens, or words. This can be done with examples that the model hasn’t been trained on. This is mainly due to the enormous training data corpus mentioned above.

Developing GPT-3

GPT-3 is known as a Transformer, meaning, it is given a prompt and generates an output. This isn’t really new technology, ever since Google released the BERT model in 2019. As mentioned before, this is the third generation of the GPT language models. Some of the main improvements on GPT-3 on its predecessor was the ability to complete tasks in specialized areas such as music and storytelling. According to Kevin Vu, a writer for DZone, “GPT-3 can now go further with tasks like answering questions, writing essays, text summarization, language translation, and generating computer code.” GPT-3 is widely known to be the largest neural network ever created. This is not a language model in which you may need additional training based on your specific needs for a task. Thus making it task-agnostic. One goal of improving GPT-2 was to reduce the contamination of the training data since their training data is sourced from the internet. Training data could in theory overlap with testing data causing memorization.

Domain Use Cases

Developers have come up with some really unique use-cases in small tasks which implement GPT-3. here is an example of someone using GPT-3 for generating SQL queries in his own database.

towardsai.net also created a list of “crazy use-cases of GPT-3” including auto-complete spreadsheets on steroids, full-blown UI design and more.

Insights from Architecture

GPT-3 uses the same attention-based architecture as their GPT-2 predecessor. The smallest model (125 million params) has 12 attention layers, each with 12x 64-dimension heads, compared to the largest model (175 billion params) has 96 layers each with 96x 128-dimension heads. To train the different sized models, the batch size, and learning rates were changed accordingly. On the GPT-3 125M, a batch size of 0.5M and a learning rate of 0.00006 was used, while on the GPT-3 175B a batch size of 3.2M and learning rate of 0.000006 was used. On the larger model, larger batch size and smaller learning rate were used.

Is it Ethical?

Artificial Intelligence has always tip-toed on the edge of ethical risks. Natural Language Processing as a domain in whole already has one toe over the line. Because of the huge success GPT-3 has had in creating news articles that humans have difficulty distinguishing, the algorithm has the potential for harmful effects. Anytime a computer is able to create something in which a human has a hard time telling the difference you run the risk of phishing attacks, impersonation, fake articles, spam, and misinformation. The difference between a human and a computer writing something is the public viewing of the content.

Programatic Introduction

I’m going to share a quick introduction from twillio on how to start using GPT-3. This includes creating an environment, generating an OpenAI key, and finally sending requests via python.

Create an Environment

Here, Twilio has us create a directory for the project, and inside we will create a python virtual environment

For Mac:

$ mkdir twilio-openai-bot
$ cd twilio-openai-bot
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ pip install openai twilio flask python-dotenv pyngrok

For Windows:

$ md twilio-openai-bot
$ cd twilio-openai-bot
$ python -m venv venv
$ venv\Scripts\activate
(venv) $ pip install openai twilio flask python-dotenv pyngrok

The last command uses pip, the Python packacge installer, to install three packages that we are going to use in this project, which are:

Configuration

As mentioned above, using GPT-3 requires an API key from OpenAI, the only way to obtain one is by being accepted into their private beta program, which you can apply here

After you get an API key, create a .env file inside your project and insert the following line of code.

OPENAI_KEY=your-openai-api-key-here

If you are unfamiliar with a .env file, we will touch on that in the next section. If you plan on publishing your project on GitHub or any other public source control platforms, make sure this file is excluded, as you don’t want to share your private key.

Sending GPT-3 requests in Python

In this section, we create the script to establish a connection and interact with the OpenAI GPT-3 engine.

First, create a file called chatbot.py and enter the following code.

import os
from dotenv import load_dotenv
import openai

load_dotenv()
openai.api_key = os.environ.get('OPENAI_KEY')
completion = openai.Completion()

start_chat_log = '''Human: Hello, who are you?
AI: I am doing great. How can I help you today?
'''

the load_dotenv() function imports the OpenAI key we put in our .env file. Note how we use the OPENAI_KEY variable in the following line to initialize OPENAI with the key. The completion variable holds the actual client to the engine, which is how we will interact with it! This is the object we will use to send queries.

Twillio also added a start_chat_log variable, containing two lines that prime the engince. Once the bot is up and running, they suggest to try different interactions in this variable to see how the bot changes its responses accordingly. This is the fun part about GPT-3!

Now let's write a function that makes a GPT-3 query. Add the following function at the bottom of chatbot.py

def ask(question, chat_log=None):
if chat_log is None:
chat_log = start_chat_log
prompt = f'{chat_log}Human: {question}\nAI:'
response = completion.create(
prompt=prompt, engine="davinci", stop=['\nHuman'], temperature=0.9,
top_p=1, frequency_penalty=0, presence_penalty=0.6, best_of=1,
max_tokens=150)
answer = response.choices[0].text.strip()
return answer

There's a lot to unpack here, let's take it in as smaller chunks.

Overall, that ask function takes in the question from the user as the first argument, followed by an optional chat log. If the chat log is not provided then the function uses start_chat_log instead. When the function is done running, we are returned with a response from the bot.

On the fourth line of the function, we create a prompt variable that contains the chat log, making it easier for us to read who says what, this is an online chat of course.

The completion.create() function is where the request is finally made to the GPT-3 sngine. This function takes a number of arguments, if you would like to learn more about this then I suggest you read the OpenAI reference docs.

Now that we know the bare bones of how this function works, lets test it out! First, navigate to your project directory and ensure you have enabled your virtual environment. Next in your terminal, type in the following commands to open a python shell.

>>> python3
>>> from chatbot import ask
>>> ask('Who played Forrest Gump in the movie?')
'Oh my, that is a tough one! Forrest Gump was played by Tom Hanks.'
>>> ask('How long does it take to travel from Los Angeles to Dublin?')
'It takes about 12 hours to fly from Los Angeles to Dublin. You may want to fly through Heathrow Airport in London.'

Pretty cool right? You can follow the rest of this tutorial on twillio’s introduction linked above.

Conclusion

The GPT-3 algorithm seems to have broken the domain wide-open. We can only hope to see the continuous rapid development of neural network architectures while keeping ethical reasoning in mind.

References

https://lambdalabs.com/blog/demystifying-gpt-3/

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Detecting objects in videos and camera feeds using Keras, OpenCV, and ImageAI

Perform Sliced (Tiled) Inference and Detailed Error Analysis using YOLOv5 Models

Detect any custom object with DeepStack

Scorecard Credit Scoring on Alibaba Cloud’s Machine Learning Platform

Two minutes NLP — Better Language Model Scaling for Downstream Tasks

Berkeley RISECamp: Deploying Deep Distributed AI at Scale

The Naive Bayes Classifier

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tristan Thompson

Tristan Thompson

More from Medium

NotesFree — NLP Transcriber

Natural Language Processing in the Near Future: A Blog around the Latest Developments in NLP

The secret of deploying GPT-3 app

Top arXiv Machine Learning papers in 2021 according to metacurate.io