An Overview of the Various BERT Pre-Training Methods

Joseph Gatto
May 14 · 7 min read
BERT Pre-Training Visualization. Figure from BERT paper.

If you are interested in machine learning then, over the past few years, you have likely heard of the Transformer model that has revolutionized Natural Language Processing.

A very popular variation of the Transformer is called BERT, which uses Transformer Encoders to learn text representations from unlabeled corpora. How do they learn from unlabeled data you ask? Well, they…