This Machine Learning Medium post does not exist

Testing OpenAI’s GPT-2 text generator model to write the next killer Medium post

Cecelia Shao
Comet
6 min readMar 21, 2019

--

A snapshot an interactive visualization of a generator and discriminator interacting from GAN Lab

We’re feeding in the first paragraph from ‘Checklist for Debugging Neural Networks’ Medium post

Our model prompt: Machine learning code can be notoriously difficult to debug with bugs that are expensive to chase. Even for simple, feedforward neural networks, you often have to make several decisions around network architecture , weight initialization, and network optimization — all of which can lead to insidious bugs in your machine learning code.

—————————————— Output 1 —————————————

So, in essence, what we are talking about here is a neural network implementation based on a real learning algorithm with a deep learning implementation. Let us assume Python works without a training layer or in the form of an iterating list that is constantly recalculated. The first step in implementing a deep learning neural network is to write a function which takes an input model as a input point, iterates over every value with it, then tries to generate each element with a new iteration of any model in the iterator. The first step in generating a model is to read an image from the input map. This image is then iterated over in the iterators that do not contain any input. In the following code, we’re iterating over the input set of the previous iteration of each model. We end up with an image that is not only different but different in content.

In order to use an iterator’s value input to generate the model, we’ve added two new ones to the input sets: the current input value and the first value of the next iteration. This iterator gets as it iterates over each of the output models, updating its value. Finally, we add a second, new value to this array. This new value is then used to produce a new iteration of the iterators until we find the next value. The iteration of the iterators in the next iteration of the model is represented by an array.

So, let’s say we want to generate an image for training in a simple image-learning algorithm, and then use it in the next iteration of the model. At first the iterators are iterated over:

>>> a = a.iter() >>> the = a.iter()

>>> inimage_train( a : “The first image is from the first image” ) [5, 10] >>> a = np.linspace(0.8f, 1.0f, 0)

>>> inimage_train( np.pi, -5.0f)

>>> result = np.array([ 6, 12.9f ], a): [‘the second image is from the first image’])

You get the idea.

The original dataset of trained models (in the original dataset) was not fully training with respect to the input in the last iteration. In the dataset training with respect to the input, there were still a couple of limitations that hindered the implementation of the original model. This is what we will see as the first important constraint.

—————————————— Output 2 ——————————————

For example, a data entry for a new data source is not necessarily necessarily a data entry for all the data. So one could easily build a system that uses feedforward neural networks and feedforward random input to solve one big problem that can be tackled quickly and accurately by a large, integrated network.

Many of the new methods in my post are for making training training queries to see if a specific function performs a specific thing. But the problem of finding the right combination of functions to train will only increase the complexity of training.

So how deep can training data become? A very important part of my post is to point out that many of the most popular and well designed training algorithms are pretty much the same as you used during your college days. So let’s imagine that each of these algorithms were developed for you by experienced network designers from other disciplines so they can be used by your company to understand how deep these techniques are in each dataset. The point here is that a trained analysis has a clear impact on the likelihood of getting a training query correct.

So, before we get to the final part of this post, I want to give you a quick update about what I am talking about. When I say the following at the end of this post, I’ve never done anything like this before, a few years ago I used an earlier article to try and explain some of what I was doing.

So, let’s run through my favorite example of this. Imagine if you were to write an algorithm for a single training problem.

In general, one of the ideas most commonly used to solve this problem is the use of data structures as networks. For instance, the network that runs the training algorithm knows exactly the position and time of every trainee. Each trainee has a very different way of calculating her first trainee rank because each data entry determines the class of the trainee. So, for each trainee, data entry 1 tells the algorithm the class name of each training, which tells the network which data entry is the class who will train it and how to calculate the number of next trainees and so on. As more and more data entry is available, you can do a lot more work to fit all the data points into a single trainee. But this is all the more reason to use data structures because they are always a good idea. If you can get more data from any data source, you’ll probably be fine.

The training problem can be solved by having two trained algorithms.

Our Reaction 👀

…not bad? While the sentence structure and flow of the generated output look realistic enough, if you actually read the content, it doesn’t make much sense. It’s amazing to see how flexible the GPT-2 model is with different data and topics — for example, it generated pseudocode in the first output.

We also tried using prompts with more proper nouns from our post on NYC Machine Learning meetups (see below). The content quickly goes off topic after the <|endoftext|> break and pivots to political content from the White House.

From machine learning meetups to the Trump Administration?

As some next steps, it would be interesting to:

  • try to have the model generate different reading levels of content (e.g. a 3rd grader reading level vs. college reading level)
  • identify which prompts the model performs ‘better’ or ‘worse’ in
  • be able to give the model more structured prompts (e.g. an article outline) instead of just the starting prompt

How to use OpenAI’s GPT-2

You can easily test out GPT-2 with your own model prompts in less than 10 minutes. Here’s some guidance on how you can set it up:

  1. Clone the GPT-2 Github repository
$ git clone https://github.com/openai/gpt-2.git
$ cd gpt-2

2. Activate a virtual environment

$ python3 -m venv ~/venv-gpt2
$ . ~/venv-gpt2/bin/activate

3. In that virtual environment, install the different dependencies needed:

$ pip install tensorflow-gpu==1.12
$ pip install -r requirements.txt

4. Download the publicly available 117M parameter version of the model:

$ python3 download_model.py 117M 

5. Since the OpenAI GPT-2 paper used a top-k random sampling, k, of k=40, we’ll use that as well.

From Fan et al., 2018, top-k random sampling means that the model will generate the probability of each word in the vocabulary of being the next likely word and randomly select the next word from k options. A larger k value means a higher likelihood of introducing unlikely words which can lead to more interesting sentences but also damage generation.

$ sed -i 's/top_k=0/top_k=40/g'

6. Begin testing the model 🚀

$ src/interactive_conditional_samples.py
$ python3 src/interactive_conditional_samples.py

Further Reading

👉🏼Read OpenAI’s blog post announcing GPT-2:

👉🏼See Allen AI’s interactive demo (includes probabilities for multiple outputs):

👉🏼See HuggingFace’s PyTorch reimplementations, pre-trained models and fine-tuning examples:

Found this article interesting? Follow us (Comet.ml) on Medium and check out some other relevant articles below! Please 👏 this article to share it!

--

--