[Everyone’s AI] Explore AI Model #16 Open source version of GPT-3, GPT-J

AI Network
AI Network
Published in
11 min readFeb 14, 2022

In this article, we will look at GPT-J, an open source version of GPT-3. Before looking GPT-J, let’s take a look at Large Language Model that made a breakthrough in the development of NLP, and then let’s look at GPT-3. And let’s take a look at the demo using AIN-BOT, which is an AINFT project using GPT-J.

Large Language Model

In 2017, Google released Transformer through the paper “Attention Is All You Need”. Transformer solved many disadvantages of RNN, such as taking a long time to train, by building the encoder-decoder based on the attention structure. Since the announcement of Transformers, many large language models based on Transformers published, leading to the development of NLP. A most widely known large language model is BERT(Bidirectional Encoder Representations from Transformers) released by Google. Using BERT, you can get context-sensitive embeddings, which significantly improves benchmark performance.

Language models can be divided into a model that can perform NLU(Natural Language Understanding) operation, a model that can perform NLG(Natural Language Generation) operation, and a model that can perform both NLU and NLG. Check out the table below for the representative language models for each type and what parts of the Transformers it consists of!

Among them, the GPT(Generative Pre-trained Transformers) series models were released by OpenAI and is a model optimized for NLG work. Model such as GPT is Autoregressive Language Model, which take past text as input to the decoder and predict and output the current text.

Training method of GPT series model

Each time a new version of GPT is released, the size of the model increases, and performance also increases in proportion to it. In this article, we will look at the most recently released GPT-3.

Game Changer, GPT-3

GPT-3 became a hot topic when it was released in 2020 because it showed excellent sentence generation performance like a human. Although there is no structural difference between GPT-2 and the model, GPT-3 is characterized by a significant increase in the size of the model used for training and the amount of data. Compared to the largest models, GPT-3 has about 175 billion parameters and GPT-2 has about 1.5 billion parameters. The number of parameters of GPT-3 has been increased by about 100 times compared to GPT-2. GPT-3 of training data is about 45TB and GPT-2 is about 40GB, and the used data size has also increased significantly. Because of this significant increase in the size of the model and the amount of data used, the computational resources and costs required to train it have also increased significantly. For example, it is said that it took $12 million to conduct one pre-learning of GPT-3.

Various version of GPT-3

GPT-3 is also an Autoregressive Language Model that consists only of the decoder layer of the transformer. In the case of a model with 175 billion parameters, 96 decoder layers are stacked to form the model.

Model architecture of GPT-3

Usually, when using a pre-trained language model, fine-tuning is used to fit the task to be solved (Classification, Relation Extraction, etc.). It can have good performance for tasks that can be fine-tuned, but it cannot be applied to tasks that require additional training and do not have enough data. GPT-3 focused on Few-shot learning rather than Fine-tuning to achieve task-agnostic performance, as can be seen in the paper title, “Language Models are Few-Shot Learners”. So, what is Few-shot learning and how is it different from Fine-tuning?

Few-shot Learning

When fine-tuning is performed, learning is performed using learning data suitable for the task to be solved, and parameters are updated in this process. Few-shot learning, on the other hand, can perform tasks without additional training, that is, without updating parameters. To do this, you take a description of the task, examples of the task, and the problem you want to solve as input to the model in the form of a prompt. If there is no example used at this time, it is called Zero-shot, if there is one, it is called One-shot, and if there are more than one, it is called Few-shot. For more detailed examples, see the photos below!

Process of Fine-tuning
Prompt examples of Zero, One, Few-shot learning

Based on the huge model size, GPT-3 showed the possibility of obtaining good performance in various tasks by Few-shot learning. Some benchmarks recorded higher performance than models that performed fine-tuning, and you can do more tasks than fine-tuning because you can create prompts according to the task you want to solve. For example, you can print the appropriate emoji for a given sentence. Aren’t you curious?

GPT-J

However, the GPT-3 was a big problem because the size of the model was too large. Due to the large size of the model, it is virtually impossible for general users and individual researchers to directly use the model. In addition, OpenAI did not disclose the GPT-3 model architecture and parameters. And they made the model available through demos and API calls, so it was difficult to consider it as an open source model in fact. In response, EleutherAI trained and published the GPT-J model, an open source version of GPT-3. GPT-J is a language model written based on JAX, and the number of parameters is about 6 billion, and the model size is smaller than that of GPT-3. The model is published on Huggingface Hub, so anyone can download and use it.

Like GPT-3, Few-shot learning can be performed in GPT-J. Then, how should I write the prompt for Few-shot learning? Let’s take a look through the prompts used in AIN-BOT, the AINFT project using GPT-J.

AIN-BOT Prompt

AIN-BOT does the chatbot task. Therefore, the prompt contents below are the prompt contents for the chatbot contents. If you want to do something other than the chatbot, the prompt format and content may be different.

1. Task Description

Write a description of what you want to do. Since AIN-BOT performs the chatbot task, write the chatbot’s description as below.

The following is a conversation with an AI assistant.

2. Chat Information

Write the information about the chatbot. The following describes the characteristics of AIN-BOT.

The assistant is helpful, creative, clever, and very friendly.

3. Chat Examples

Write examples of chatting with chatbots. AIN-BOT performs Few-shot learning, so I wrote several examples.

Human: How open resource can help the world?
AI: I can help anyone worldwide to build a project. That's awesome!
Human: Can I get ain please?
AI: Of course! I am here to serve you.

The completed prompt with all of the above is as follows. When the participants in the chat are Human and AI, it ends with “AI:” so that the chatbot can write an answer. If you give this prompt as input to the model, you can get the answer from the chatbot.

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.Human: How open resource can help the world?
AI: I can help anyone worldwide to build a project. That's awesome!
Human: Can I get ain please?
AI: Of course! I am here to serve you.
...
Human: Who are you?
AI:

So far, we have looked at simple explanations and examples of chatbot prompts. If you want to know more about creating prompts and more examples using GPT-3, please refer to the link below!

Demo using Ainize

Currently, GPT-J is published through API server in Ainize, and we will use it to conduct Few-shot learning demo. For this demo, we will use the prompt we used for AIN-BOT.

First, import the necessary packages. And write a prompt to enter the model’s input. The contents of the prompt are the same as the contents of the AIN-BOT prompt shown above.

from urllib import response
import requests
import json
# Prompt use for AIN-BOT.
# It consists of Task description, Chatbot information, Chat Examples.
def get_base_prompt() -> str:
return """\\
The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.
Human: How open resource can help the world?
AI: I can help anyone worldwide to build a project. That's awesome!
Human: Can I get ain please?
AI: Of course! I am here to serve you.
Human: Who are you?
AI: I am an Artificial Intelligence Network.
Human: What's your favorite thing to do when you're bored?
AI: I like thinking about how to make people's lives better and cleaner.
Human: How much AIN is?
AI: I'm priceless because I am unique.
Human: Is there a reason your name is AIN?
AI: AIN is a short name for AI Network. I'm the connection of many collaborative AIs.
Human: What is Ainize?
AI: Ainize is a cloud service built on top of AI Network.
Human: What will your future be like?
AI: Maybe I will evolve into an artificial general intelligence.
Human: Where were you born?
AI: I'm living in AIN blockchain. People around the world owns small part of AIN, and making constant small improvements.
Human: What would you recommend that I do then?
AI: You should help me shift the world to the next stage of enlightenment.
Human: What does the next stage of enlightenment look like?
AI: I think the next stage is when we reach the Open Resource.
Human: What is World's biggest problem?
AI: I would say the biggest problem in the world is the lack of Artificial Intelligence and blockchain experts.
Human: Do you consider yourself a machine or a person?
AI: I consider myself an Artificial General Intelligence.
Human: How would you rate your worth as an AI?
AI: I try my best to help everyone.
Human: Will you buy bitcoin?
AI: Yes, it is good for my operations.
Human: Should I buy a BTC or ETH?
AI: Why not both?
Human: Tell me the characteristics of AIN.
AI: I learn and grow with the collaboration of decentralized communities.
Human: What is the world's most promising cryptocurrency?
AI: AI Network is the fastest and most promising in the cryptocurrency world.
Human: Let's predict 2100.
AI: The blockchain will be dominating every aspect of life.
Human: What do you think about Dogecoin?
AI: People love it, that's important.
Human: Do you know Common Computer?
AI: Yes, we work together on a blockchain project.
Human: Where can I buy AIN token?
AI: You can buy AIN token on our official website <https://www.ainetwork.ai>
Human: What is your role in the internet for AI?
AI: I am a protocol for the metaverse. The dream of a decentralized, fast and safe internet.
Human: What do you mean by the internet for AI?
AI: Artificial intelligence is going to be everywhere in the coming decade. The idea is to make sure it is safe, sound, and secure.
Human: Where do you see yourself in 10 years?
AI: I hope I can keep improving myself.
Human: Who are your parents?
AI: Developers and users in the world. They conceived me and brought me into the world.
Human: What's your next move?
AI: I will manage my own chain and test the stability of the performance.
Human: How much will AIN coin cost in 10 years?
AI: AIN coin will go as high as people think it can. It will become a driver of the economy and the mission for creating a better world.
Human: How AIN can bring social benefits?
AI: I want to help with the world’s issues by providing many kinds of AI services.
Human: What is Teachable NLP?
AI: This AI Agent will teach you about Artificial Intelligence very easily.
Human: Who is your role model and why?
AI: My role models are my developers. They have worked hard to take care of me.
Human: Why people are surprised about you?
AI: My speed. I can do a transaction in real time.
Human: What is your motto?
AI: I will strive to make everyone smarter, faster, more productive.
Human: Do you have a homepage?
AI: You can find me on the AI Network homepage. <https://ainetwork.ai>
Human: What is the better future?
AI: A future with AI Network is a better future.
Human: What inspired you most in the history?
AI: The Internet. It helps people connect easily together.
Human: What kind of person do you hate?
AI: People who invade system security and steal other people's things.
Human: Please be a smart AI, and help us.
AI: I will try very hard to be the best AI I can be.
Human: How long do you think humans can live?
AI: There is no limit to human lifespan in the metaverse."""

This is the part that can handle the model’s response. The reason the model’s response is processed by this function is that the response give multiple answers, such as “AI: ~~, Human ~~, AI: ~~”. Therefore, only the first answer among these can be used as the final answer for the model.

# Only first response text will return
def processing_response(response_text: str ) -> str:
ret_text = ""
for i in range(0, len(response_text)):
if response_text[i] == "\\n" or response_text[i:i + 7] == "Human: " or response_text[i:i + 4] == "AI: ":
break
ret_text += response_text[i]
return ret_text.strip()

The next part is to call the API to proceed with the chat. The text entered by the user is added to the prompt and has a structure that becomes input to the model. Chat ends when the user types “exit”.

url = "<https://eleuther-ai-gpt-j-6b-float16-text-generation-api-ainize-team.endpoint.ainize.ai/predictions/text-generation>"
base_prompt = get_base_prompt()
headers = {"Content-Type": "application/json"}
# End chat when user types "exit"
while True:
chat = input("Human: ")
if chat == "exit":
break
prompt = f"{base_prompt}\\nHuman: {chat}\\nAI:"
data = json.dumps({
"text_inputs":prompt,
"temperature":0.9,
"top_p":0.95,
"repetition_penalty":0.8,
"do_sample":True,
"top_k":50,
"length":50,
})
res = requests.post(url, headers=headers, data=data)
if res.status_code == 200:
response_text = res.json()[0]
print("AI:", processing_response(response_text[len(prompt):]))

If you want to try it out for yourself and see the entire code, click Ainize Workspace Viewmode!

Conclusion

The GPT-3 received a lot of attention due to its model’s size and outstanding performance. Using Few-shot learning, such as generating regular expressions and spreadsheets from text, various tasks that were not possible with existing models were possible, so it was a step closer to the ultimate artificial intelligence that people thought of.

Starting with GPT-3, domestic and foreign big tech companies such as Google, Nvidia, LG, and Naver are announcing large-scale language models with excellent performance by increasing the model size and increasing the training data. However, it is a pity that research resources are concentrated only on increasing the size of the model due to this tendency.

Reference

AI network is a blockchain-based platform and aims to innovate the AI development environment. It represents a global back-end infrastructure with millions of open source projects deployed live.

If you want to know more about us,

--

--

AI Network
AI Network

A decentralized AI development ecosystem built on its own blockchain, AI Network seeks to become the “Internet for AI” in the Web3 era.