Beginner’s Guide to Retrain GPT-2 (117M) to Generate Custom Text Content

Published in

AI2 Labs

17 min readMay 13, 2019

Image taken from https://openai.com/blog/better-language-models/

In this article, we will be exploring the steps required to retrain GPT-2 (117M) using custom text dataset on Windows. For start, GPT-2 is the advanced version of a transformer-based model that was trained to generates synthetic text samples from a variety of user-prompts as input. Check out the official blog post to find out more about GPT-2:

The original version has 1.5GB parameters but the creator, OpenAI team did not released the pre-trained model due to their concerns about malicious applications of the technology. Having said that, they did released a smaller version which has 117MB parameters that can be retrained on custom text dataset at the time of this writing.

EDIT: Check the comment section for more information if you are having issue with it. It is highly recommended to use the same settings and setup as this article was written a long time ago. If you intend to fine-tune the model using models other than 117MB, make sure that you have sufficient GPU memory else you will encounter Out of Memory Error.
Use SGD instead of Adam optimizer to reduce your memory usage. You should be able to fine-tune 774M model + SGD optimizer using Titan RTX (24Gb).

There are altogether 5 sections in this article.

Beginner’s Guide to Retrain GPT-2 (117M) to Generate Custom Text Content

Written by Ng Wai Foong