BERT for Everyone

How Google’s State-of-the-Art Natural Language Processing AI works, in layman’s terms

Published in

Analytics Vidhya

4 min readFeb 28, 2020

In late 2019, Google unveiled the Bidirectional Encoder Representations from Transformers (BERT) model, which the company said would impact 1 in 10 queries in their search engine. Its neural network-based technique for natural language processing (NLP) broke several records for NLP problems. There’s been a lot of activity in the machine learning community about BERT — but I’ve yet to come upon a non-hyper-technical tutorial jam-packed with intricate technicalities.

With today’s programming developments, it’s not necessary to understand the technical mathematics behind techniques, especially if they are as complex as BERT. This article will intuitively explain the gist of how BERT works without subjecting the reader to an onslaught of equations.

Let’s get started!

NLP models need to be pretrained — it takes several years to get a solid grasp on any language, and even with the speedup computers offer, they can’t learn a language in a few minutes or even a day. Pretraining prior to BERT was limited to word embeddings that mapped each word to a vector with some aspects of its meaning. For example, ‘watermelon’ to ‘green’, ‘fruit’, or ‘seed’. The embeddings are trained on a massive unlabeled set of…

BERT for Everyone

How Google’s State-of-the-Art Natural Language Processing AI works, in layman’s terms

Written by Andre Ye