Fine-tuning GPT-2 on Harry Potter texts for free

6 min readApr 7, 2020

Even if you have a passing interest in Tech and AI, you must’ve heard about this fantastic AI language model called GPT-2 released in 2019 by OpenAI, which can generate human-like passages of text. The model is based on the transformer architecture which was introduced by Google in 2017.

GPT-2 model has wide variety of applications in creative writing, text translation and summarization. With slight modifications in training data, it can even perform well on QnA type of problems.

GPT-2 was trained in an unsupervised manner on a huge text dump of 8 million web pages.

There are four variants of the GPT-2 model which have been subsequently released in stages by OpenAI due to concerns of misuse especially for generation of fake news. The models were released in order of size (and thus complexity)

124M
345M
774M
1558M or 1.5B

That’s cool and all but how do I use it?

On its own, the GPT-2 model is pretty capable, all you need is some text prompt about a topic and it’ll continue the text from the information it has learnt. This is known as zero-shot learning.

Some samples follow:

You can try generating your own samples here.

Fine-Tuning GPT-2 (for free)

Now we come to the crux of this story. In playing around with the models, you may want it turned for a specific task instead of the general purpose beast it is. Maybe you want it to write song lyrics like Taylor Swift or want it to write short stories in style of J.R.R Tolkien. All of this is possible with the technique of Fine-tuning.

In essence, fine-tuning in ML is just a re-training of the model with new data but with existing weights. So instead of learning patterns from scratch, the model makes use of existing knowledge hence making the training step much shorter. Instead of training the model for days or weeks you could be done in few hours.

Now to go about the business of fine-tuning a model , you’ll need access to the saved model weights, model definition in a framework of your choice(Tensorflow, Pytorch etc.) and lot of boiler plate code to load the model and create a training pipeline.

Now that’s a useful exercise to be done on its own but in the interest of time and sanity, we already have a beautiful library called gpt-2-simple which takes the headache out of fine-tuning of the gpt-2 model. If you look at the readme in GitHub, you’ll find a Colab notebook link showing an example use case. This is what we’re going to use here.

We’re going to train the 1.5B model on Google Colab for free! No free lunch you say? 😁

To get started, create a copy of the Colab notebook and save it on your Google drive.

Next specify the model type which you want to download.

Next, you need to mount your Google drive in the Colab vm using the cell in the notebook. It’ll ask for an authorisation code which you’ll get by clicking on the link.

You need to upload your data(song lyrics, content of books etc ) in a text file to your Google drive and provide the filename in the notebook.

Once the basic setup is done, you can proceed with fine-tuning of the model.

Depending upon the model variant, data size, the number of steps and the hardware type chosen,your training can take from 10–15 minutes to 4–5 hours. Since Colab notebook have an idle timeout of around 90 minutes, you need to find a way to keep the browser window open and active. One way I did that was to open the Colab page on the phone.

Even with that your notebook may timeout within 4–5 hours depending upon the hardware chosen (GPU/TPU tend to timeout sooner in my experience even if they are actively being used by your code).

Now it is simply less painful to train the smaller models like 124M and 345M and these may work well for many tasks. For example, I fine-tuned the 124M model on Taylor Swift lyrics. The output I got after 500 steps of training:

“White Christmas” is a copied line but other lines are novel

Great! Add some music and you are the next Taylor Swift. 😜

Caveat: Depending upon the dataset size and how it’s laid out, the model may simply learn to copy it without creating any novel content. You need to manually check the output to see if this is case. While training with a one liner joke dataset, I encountered the same problem.

Here’s the output from 774M model:

In my opinion, the 774M model generated more novel lyrics and had better vocabulary .

According to the Colab notebook, it’s not possible to train the 774M and 1.5B in Colab environment. But I still wanted to try them out otherwise what’s the fun..right?

I noticed different machine configurations were being randomly assigned to my Colab environment.

12 GB RAM with a K80 GPU(8 GB) which is fine for training the 345M model but is insufficient for training bigger ones.
24 GB with P/V100 (16 GB ) — This could train the 774M model in about 2 hours on Taylor Swift lyrics.
35 GB RAM with TPU : This was a rare configuration but I was able to get it on several occasions( There is also a 12 GB RAM configuration but it’s insufficient). Using this , I was finally able to train the 1.5B model on Harry Potter texts. However the model is quite huge(6.75 Gb) and trains quite slowly. It took me 3–4 tries with multiple usage limitations errors to train the model to 350 steps during two days.

Remember to save your checkpoint every 100 or so steps(by manually stopping the training) in your Google drive or local PC, otherwise your checkpoint will be gone once the Colab instance is destroyed.

To generate the Harry Potter data , I simply took the text only version of books from here and pasted everything into a single text file.

Finally once the training is complete, I generated some sample output from the saved model.

Yeah those forelegs are not worth the hype at all!

Sometimes the model took something from real world and added it in Harry Potter universe.

Other times it exported the Harry Potter universe to real world.

Conclusion

This turned out to be a very fascinating experiment and I am chuffed that it turned out well. Now I don’t hope to become a novel writer overnight but If I was one, then a tool like this would help me out a lot. Generative Models like GPT-2 have a bright future ahead and in time will become more capable and powerful.

Fine-tuning GPT-2 on Harry Potter texts for free

That’s cool and all but how do I use it?

Fine-Tuning GPT-2 (for free)

Conclusion

Written by Manjeet Singh