BERT, DistilBERT, RoBERta, and XLNet simplified Explanation

Wired Wisdom
DataSeries
Published in
3 min readJul 9, 2020

--

Happy Thursday! Today in Everything simplified we will understand about BERT, DistilBERT, RoBERta, and XLNet.

Source

In 2018, The tech giant google released the state of the art question answering model called BERT — Bidirectional Encoder Representation from Transformers. It have taken the NLP sector like a storm and it outperformed many state of the art models in different types of tasks. Bert are pre-trained models that process and understands words in relation to each other rather than processing it one by one. So we can say that Bert models have the capability to understand the full context of the text.

After some time we were able to see new variations of BERT which helped in improving accuracy and computational cost in many cases. Let’s discuss about all of this today.

Comparison — Source

BERT

BERT models uses unsupervised learning of large amount of unlabelled data (masked data) for pre-training and these pre-trained models can be then used to fine tune over small supervised training data to achieve great accuracy. Bert makes use of transformers and stacks multiple transformer encoders on top of each. It used…

--

--