TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Deep Learning

Beautifully Illustrated: NLP Models from RNN to Transformer

Explaining their complex mathematical formula with working diagrams

Albers Uzila
TDS Archive
Published in
12 min readOct 11, 2022

--

Photo by Rubaitul Azad on Unsplash
Table of Contents· Recurrent Neural Networks (RNN)
Vanilla RNN
Long Short-term Memory (LSTM)
Gated Recurrent Unit (GRU)
· RNN Architectures· Attention
Seq2seq with Attention
Self-attention
Multi-head Attention
· Transformer
Step 1. Adding Positional Encoding to Word Embeddings
Step 2. Encoder: Multi-head Attention and Feed Forward
Step 3. Decoder: (Masked) Multi-head Attention and Feed Forward
Step 4. Classifier
· Wrapping Up

Natural Language Processing (NLP) is a challenging problem in deep learning since computers don’t understand what to do with raw words. To use computer power, we need to convert words to vectors before feeding them into a model. The resulting vectors are called word embeddings.

Those embeddings can be used to solve the desired task, such as sentiment classification, text generation, name entity recognition, or machine translation. They are processed in a clever way such that the performance of the model for some tasks becomes on par with that of…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Albers Uzila
Albers Uzila

Written by Albers Uzila

Data Scientist, MSc Math. Support the madness: buymeacoffee.com/dwiuzila 🔥 paypal.me/dwiuzila ☕ Thanks!

Responses (8)