OpenAI GPT: Generative Pre-Training for Language Understanding

Understanding Transformer-Based Self-Supervised Architectures

Rohan Jagtap
DataSeries

--

Photo by Green Chameleon on Unsplash

Language Modeling is currently the biggest trend in NLP. All the major tasks in NLP follow the pattern of self-supervised pre-training a corpus on the language model architecture followed by fine-tuning the model for the required downstream task. Since this modeling is partially unsupervised (and partially supervised), this is also a use case of semi-supervised training.

In this article, I’ll be delineating OpenAI GPT, which is one of the most important and fundamental models in language understanding that helped lay the foundation of language modeling. This model also is one of the pioneers in the burgeoning of NLP in a high number of training parameters with 110M parameters (which may seem less at the date, however it was a great deal when it came out).

Generative Pre-Training

As mentioned earlier, GPT is one of the pioneers in Language Understanding and Modeling. Hence, it essentially proposes the concept of pre-training a language model on a huge corpus of data and then fine-tuning. This being said, we will further move on with the specifics of GPT.

The Architecture

--

--

Rohan Jagtap
DataSeries

Immensely interested in AI Research | I read papers and post my notes on Medium