Geek Culture
Published in

Geek Culture

Exploring the Text Generation with GPT-NeoX

In a quest to open source GPT-3 like 175B parameters model, EletutherAI has been doing instrumental work and releasing one model after another. The latest one to come out from their magic box is GPT-NeoX!

What is GPT-NeoX?

GPT-NeoX or GPT-NeoX-20B model is an Autoregressive Language Model. It is a 20 Billion parameters model trained on The Pile dataset in collaboration with CoreWeave.

It is claimed to be the largest publicly available pre-trained general-purpose autoregressive language model.

What type of applications can we build using it?

With help of the GPT-NeoX model, we can build applications like Text Summarization, Paraphrasing, Code generation, Content writing, Text auto-correct, Text auto-completion, Chatbot, and Text generation.

Is the model available to use?

Yes, this model is available to use. Model weights links are given on the Github page of the GPT-NeoX. Github Repository link is shared in References.

Text-generation with GPT-NeoX

To generate text from GPT-NeoX we need to perform a series of steps. But before that, we will need an instance that can load the GPT-NeoX model.

As per the GPT-NeoX models Github repository, we will need at least 2 GPUs and we will need more than 45GB of GPU RAM. Also, 30–40GB of system RAM (memory) is required.

We can generate three types of text using GPT-NeoX.

  • Unconditional Text Generation: In Unconditional Text Generation using GPT-NeoX we just generate the text from the model without providing any input to the model.
  • Conditional Text Generation: In contrast to Unconditional Text Generation, Conditional Text Generation will need a starting of the text. After providing a starting sentence to the model, the model will predict the next token and so on.
  • Interactive Text Generation: Interactive Text Generation allows for multiple rounds of back-and-forth between a user and the language model via a command-line interface.

Steps to setup GPT-NeoX on GCP VM Instance

To set up the GPT-NeoX on GCP/AWS we will need an instance with 45GB of GPU RAM and 40GB of CPU RAM. On AWS we will need g4dn.12xlarge or a similar instance.

Step 1: Installing necessary ubuntu dependencies:

sudo apt update
sudo apt install python3-pip python3-dev build-essential libssl-dev libffi-dev
sudo python3-setuptools
sudo apt install virtualenv
sudo apt install git
sudo apt install wget
sudo apt install vim

Step 2: Setup Github Repository:

git clone
cd gpt-neox/
virtualenv env_gpt_neox --python=python3
source env_gpt_neox/bin/activate
pip install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio==0.8.2 -f
pip install -r requirements/requirements.txt
python /root/gpt-neox/megatron/fused_kernels/ install

While following step number 2, we faced an error while installing mpi4py and solved it with the below approach:

pip install pip --upgrade
sudo apt install libopenmpi-dev
pip install -r requirements/requirements.txt

If you face YAML error then run the below command:

pip install -U PyYAML

Step 3: Download the model and change configurations:

The following command will download the 39GB of data.

wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" -P 20B_checkpoints

Set pipe-parallel-size half of your GPU count in the “./configs/20B.yml” file. Let’s say you have 4 GPUs then your pipe parallel size will be 2.

Open the configuration file with the command:

vim ./configs/20B.yml

And in the file change “pipe-parallel-size: 4” with the “pipe-parallel-size: 2”.

Step 4: To generate text with GPT-NeoX:

To generate text unconditionally, run the below command :

python ./ ./configs/20B.yml

For conditional text generation:

Create a prompt.txt file and place your inputs in the file separated with “\n” then run the below command.

python ./ ./configs/20B.yml -i prompt.txt -o sample_outputs.txt

The output will be generated in the “sample_outputs.txt” file.

Sample text-generated by GPT-NeoX

For unconditional text generation below are some sample outputs:

1. Response: {
"context": "",
"text": "A systematic review and meta-analysis of the efficacy of endoscopic balloon dilation for anastomotic strictures after esophagectomy.\nEndoscopic balloon dilation (EBD) is widely used to treat anastomotic strictures after esophagectomy; however, there is no consensus on its efficacy. This systematic review and meta-analysis was performed to assess the efficacy of EBD for anastomotic strictures after esophagectomy. A literature search was performed on PubMed, Embase, and the Cochrane Library to identify eligible studies",
"length": 102,
"finished": false,
"message": null,
"duration_seconds": 9.70399785041809
2. Response: {
"context": "",
"text": "Aquaman (2018)\n\nAquaman is the DC Comics hero who can command the seas. He is a powerful swimmer and can communicate with sea life.\n\nAquaman has been the king of Atlantis since the day he was born. He is the only child of the King of Atlantis, Tom Curry, and Queen Atlanna. Aquaman\u2019s parents were killed in a mysterious accident when he was a young boy, and Aquaman was raised by his step-mother",
"length": 102,
"finished": false,
"message": null,
"duration_seconds": 9.708730936050415

Sample output for conditional text generation:

"context": "Electric cars will",
"text": " not be able to compete with conventional vehicles without a government-backed $15,000 rebate.\n\nIn fact, he said the government would need to spend hundreds of billions of dollars to encourage people to switch from their gas guzzlers to electric cars.\n\n\"I can't see it happening. It's not economic. What are we doing with the $15,000 rebate? It's not economic. It's not going to happen.\"\n\nMr Joyce said the lack of infrastructure and battery technology meant electric vehicles were not viable for consumers at this stage.\n\nHe said the Government should instead focus on developing more efficient engines, encouraging people to switch to smaller cars and investing in public transport.\n\n\"We need to get out of this mindset that we're going to have to have electric vehicles to reduce emissions,\" he said....",
"length": 694,
"finished": true,
"message": null,
"duration_seconds": 69.99025082588196
Note: "We truncated the text response as we just providing the sample to show the response format."

Time taken by the Model to generate tokens

We have also noted down the time to generate the tokens after the model loaded, to check the speed performance of the GPT-NeoX model.

The Google Cloud Platform VM Instance with the following configuration was used to generate tokens:

Instance name: n1-standard-16
16 vCPUs
4 Nvidia Tesla T4 GPUs
Ubuntu 20.04

Time taken to generate tokens:

For 100 tokens it takes around 8-12 seconds
For 200 tokens it takes around 20-25 seconds
For 512 tokens it takes around 45-55 seconds
For 1024 tokens it takes around 100-110 seconds

Hope you like our experimentation on GPT-NeoX. We are further working on fine-tuning of GPT-NeoX and improving the performance/inference time of the model.


Link to GPT-NeoX paper:

Link to GPT-NeoX Github Repo:

Originally published at Exploring the Text generation with GPT-NeoX on April 7, 2022.



A new tech publication by Start it up (

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store