Attention (Plus) Is All You Need

Lijesh Shetty
Analytics Vidhya
Published in
9 min readApr 19, 2020

--

Motivation for this article is to discuss a few Natural Language Processing (NLP) models & exciting developments in that space, and to showcase implementations for those models.

Steffi Graf & Martina Navratilova (Credit : gettyimages)

Growing up, I was glued to my Television set whenever there was a match between Martina Navratilova, a seasoned champion, and Steffi Graf, a budding one (and my celebrity crush). Martina was ubiquitous while Steffi was promising, rearing her head to become the new champ. The duel between NLP models mirror their rivalry, and is now what keeps me glued these days.

RNN Everywhere

Like Martina, until recently Recurrent Neural Network (RNN’s) was everywhere and considered one of the best solutions for NLP problems. Developments with newer models seek to unseat RNN and become the new champ. Before we get into different model types, let’s briefly look at word embeddings and the progress that has been made in that space.

Word Embeddings

Words need to be numerically represented for a model to understand and use them in its calculation. Word2Vec and Glove, both shallow models, were used widely for these tasks. Word2Vec built a number (vector) representation for words to capture the meaning of word, its similarity/semantic relationship with other words, and its association with grammar (syntactic…

--

--