Member-only story
Guide to fine-tuning Text Generation models: GPT-2, GPT-Neo and T5
Going through the basics of massive language models, we learn about the different open-source models and then compare them by fine-tuning each one of them for the sentiment detection task.
The code used in this article can be found here — GPT and T5. To read more about text generation models, see this. For more such articles visit my website or have a look at my latest short book on Data science. You can also connect with me on LinkedIn.
Introduction
Recent researches in NLP led to the release of multiple massive-sized pre-trained text generation models like GPT-{1,2,3}, GPT-{Neo, J} and T5. If the audiences (including you and me) were not impressed with their tunable parameter’s size going into billions, we were enthralled by the ease with which they can be used for a completely new unseen task, and without training for a single epoch! While this is okay for quick experiments, for any real production deployment, it is still recommended to further train the models for the specific task. This is called fine-tuning, and in this article, we will practically learn the ways to fine-tune some of the best (read state-of-the-art) language models currently…