TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Guide to fine-tuning Text Generation models: GPT-2, GPT-Neo and T5

12 min readJul 11, 2021

--

The code used in this article can be found here — GPT and T5. To read more about text generation models, see this. For more such articles visit my website or have a look at my latest short book on Data science. You can also connect with me on LinkedIn.

Introduction

Recent researches in NLP led to the release of multiple massive-sized pre-trained text generation models like GPT-{1,2,3}, GPT-{Neo, J} and T5. If the audiences (including you and me) were not impressed with their tunable parameter’s size going into billions, we were enthralled by the ease with which they can be used for a completely new unseen task, and without training for a single epoch! While this is okay for quick experiments, for any real production deployment, it is still recommended to further train the models for the specific task. This is called fine-tuning, and in this article, we will practically learn the ways to fine-tune some of the best (read state-of-the-art) language models currently…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Mohit Mayank
Mohit Mayank

Written by Mohit Mayank

Senior Data Scientist | AI/ML Researcher | Creator of “Jaal” | Author of “Lazy Data Science Guide” | Linkedin & Twitter: @imohitmayank

Responses (13)