BERT: Pre-Training of Transformers for Language Understanding
Understanding Transformer-Based Self-Supervised Architectures
Pre-training Language Models has taken over a majority of tasks in NLP. The 2017 paper, “Attention Is All You Need”, which proposed the Transformer architecture, changed the course of NLP. Based on that, several architectures like BERT, Open AI GPT evolved by leveraging self-supervised learning.
In this article, we discuss BERT : Bidirectional Encoder Representations from Transformers; which was proposed by Google AI in the paper, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. This is one of the groundbreaking models that has achieved the state of the art in many downstream tasks and is widely used.
Overview
BERT leverages a fine-tuning based approach for applying pre-trained language models; i.e. a common architecture is trained for a relatively generic task, and then, it is fine-tuned on specific downstream tasks that are more or less similar to the pre-training task.