Text Generation with GPT-2 in Action

Bill Huang
6 min readAug 24, 2019

--

Hello friends, this is the third post of my serial “NLP in Action”, in this serial posts, I will share how to do NLP tasks with some SOTA technique with “code-first” idea — — which is inspired by fast.ai.

And I am also looking forwards for your feedbacks and suggestion.
My serial “NLP in Action” contains:

About Text Generation

Text Generation is a usual NLP task, the purpose of text generation is to generate text from the given context, and make the newly generate text appearing indistinguishable to human-written text .

You can treat text generation as a task for the computer to write texts like human.With text generation, we can let the computer write stories, poems, news articles, and more for us.The better the model is , the more human like the generation will be.

Here come a funny test, can you tell the sentences below, which one is written by human, which one is written by human for the beginning and generate the left by computer?

Setence A:
Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976. Its stock began trading on the New York Stock Exchange on June 1, 1978. It was the first computer company to be listed on the NAS.

Sentence B:
Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976 to develop and sell Wozniak’s Apple I personal computer, though Wayne sold his share back within 12 days. It was incorporated as Apple Computer, Inc.

About LM

Language Model(LM) is the keystone for NLP, language model is a probability distribution over sequences of words, given the context, it will tell you the possibility of the next token.

In order to generate text, it is good to use language model, since language model will tell you the next most possible token base on your context. We can treat text generation process as a recurrent next token predict process.

Here is an example:

E.g.
when you only say “Merry Christmas and happy”, with language model, it may predict the next token base on your context to be ‘new’ ,and then the context will be “Merry Christmas and happy new”, again, considering the text ,the language model may predict the next most reasonable token to be ‘year’.

In this example, we you give “Merry Christmas and happy”, the language model will generate “new year” for you. So that, the full text will be “Merry Christmas and happy new year”.

Does it sound like a magic?

About GPT-2

GPT-2 is a language model comes from OpenAI’s paper “Language Models are Unsupervised Multitask Learners”. GPT was a variation of the famous Transformer architecture proposed by the Google Brain team in their paper “Attention is all You Need” .

With deep transformer layers, GPT-2 can detect long term dependencies between textual data. The model was trained unsupervised, by predicting the next word in 40GB of Internet text. So that, this model had a good sense of telling you what the next token based on the context.

The full GPT-2 model contain 1.5 billion parameters, it had achieved SOTA performance on many language modeling benchmarks especially text generation.

For the concerns about misusing GPT-2 model, OpenAI didn’t release the full version of GPT-2, the GPT-2 model we use in this post is a large model version which was released on Aug 20, 2019 by OpenAI with 774 million parameters.

In this post, I will show how to use GPT-2 model to do text generation

In Action

As a saying goes “No water, no swimming, no sailing, no boating.”, it would be better to get your hand on codes, if we want to get a clearer understanding of doing text generation.

Here, I will use the excellent library transformers which deploy by huggingface , this library contains some state-of-the-art pre-trained models for Natural Language Processing (NLP) like GPT-2, XLNet, BERT … etc.

The process of doing text generation with GPT-2 contains 4 steps:
1. Load model and tokenizer
2. Set generation process parameters
3. Make context into embedding
4. Generate new text with model based on context

All the code will show with jupyter notebook here.

And I will give a brief introduction of each step.

1.Load model and tokenizer

It is highly recommended to pre-download models and vocabulary files into local folder before using the model and tokenizer, so that we can enjoy a smooth using experience.

The files including:
1. pytorch_model.bin
2. config.json
3. vocab.json
4. merges.txt

Click each link above, download the files and rename them as presented above, then put them in the same folder like “gpt2-large”.
We can load the model and tokenizer with:

2.Set generation process parameters

For text generation, we need to tell the model how long the texts will generate, how to evaluate the next token is acceptable and decide how predictable text or surprising the text will be.
The parameters including:

  • Seed, int num for getting a random num
  • Temperature, int num to treat as a magic num to make the generating process unpredictable
  • Max_len, int num to define how long the text the model will generate
  • Top_k, int num help the model only pick top K possible candidate token base on the context for each run of next token prediction
  • Top_p, float num to filter the next predict token, only when the next token’s possibility higher than this num, can be taken into consideration for the predicting process

You can change the parameters as you want, to make this generation process as a box of chocolates like life ^-^

3.Make context into embedding

We can set the context what we want, then tokenize and transform it into embedding. Then wait for the magic to happen in next step.

4.Generate new text with model based on context

The precess of generating text including:

  1. Sampling, in order the pick up the most reasonable next token from the context, the language model will calculate each word’s conditional probability based on the context. The higher the probability the word is, the more possible it will be picked as the next token
  2. Filtering, since there are tons of words can be candidate for the next token choice, we need to set up some filter to only consider the possible ones
  3. Beam search, as we know, when a sentence is fluency, not only because each word connects good, but also all the words in the sentence can together become a whole. So it will be the same for the text generation process, the model will consider words for a long term, the model will try to pick up words that not only have a good sense with token before it, but also connect good with all the words in the sentence
  4. Recurrent decision, after picking some reasonable token for the next position as candidate, the model will repeat the process described above, keep changing and picking tokens for each position. And finally generate a sentence that fit the requirement of length.

It is such a great joy to play with text generation, I keep changing the context and generation process parameters described above, then get some result that I hardly can tell it was written by computer.

E.g:
Context:
Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976.
Generate text:
Its stock began trading on the New York Stock Exchange on June 1, 1978. It was the first computer company to be listed on the NAS
Combined result:
Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976. Its stock began trading on the New York Stock Exchange on June 1, 1978. It was the first computer company to be listed on the NAS

Summary

Text generation is a task in NLP to generate texts based on the context, with language model, we can let the computer to generate text for us. Deep transformer model like GPT-2 had a good sense of predicting the most reasonable word based on the context, which can help us to make SOTA text generation result.

Disclaimer

Text generation is a task for NLP, I write this post for educational purposes only. Please don’t misuse this model and technology.

--

--