Do-BERT

(Image source: Pixabay)

BERT (Bidirectional Encoder Representations from Transformers) has taken the world of NLP (Natural Language Processing) by storm.

Language-text is essentially a sequence of words. So, traditional methods like RNNs (Recurrent Neural Networks) and LSTMs (Long Short Term Memory) used to be ubiquitous in Language Modeling (predicting next word. Remember, typing SMS?). But they would not remember previous words a bit far away. Then came ‘Attention is All you need’ and its architecture called, `Transformer’.

BERT is a Transformer-based machine learning technique for NLP pre-training developed by in 2018 by Jacob Devlin and his colleagues from Google.

Following sketchnote gives overview of BERT:

References

  • “Transformer: A Novel Neural Network Architecture for Language Understanding” — Google AI Blog (link)
  • “A Visual Guide to Using BERT for the First Time” — Jay Alammar (link)
  • “The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)” — Jay Alammar (link)
  • “The Illustrated Transformer” — Jay Alammar (link)
  • “Explaining BERT Simply Using Sketches” — Rahul Agarwal (link)
  • “Attention Is All You Need” — Ashish Vaswani et al. (link)

Originally published at LinkedIn

--

--

Yogesh Haribhau Kulkarni (PhD)
Google Developer Experts

PhD in Geometric Modeling | Google Developer Expert (Machine Learning) | Top Writer 3x (Medium) | More at https://www.linkedin.com/in/yogeshkulkarni/