Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Intuition Builder: How to Wrap Your Mind Around Transformer’s Attention Mechanism

A Transformer Attention Mechanism Explainer for the Rest of Us

10 min readDec 16, 2021

--

attention calculation
Attention calculation — Animation created by the author

Table of Contents

· Motivation
· Fundamental Building Blocks: A Special Vector Called ‘Embedding’
· Core Mechanism: Dot Product of Vectors
· Let’s Apply it on Something Easier: Recommender System
· Now We Can Talk About Attention: How YouTube Finds The Video You Search (QKV system)
· Attention in Translation, It’s All Very Natural
· Self-Attention: Finds the More Sophisticated Self
· Conclusion

Motivation

If you work in the artificial intelligence industry or are studying to get into it, there is little chance that you haven’t heard about Transformer. Google introduced it with its signature paper Attention is All You Need, Vaswani et al. (2017). It very quickly gained popularity among researchers in NLP, and people re-implemented major NLP papers used to done with RNN/LSTM with the Transformer. Transformer-based pre-trained language models OpenAI’s GPT-3 and tools like Hugging Face quickly gained traction in industrial and business realms. But it…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Michael Li
Michael Li

Written by Michael Li

Data Scientist | Blogger | Product Manager | Developer | Pentester | https://www.linkedin.com/in/michael-li-dfw

Responses (5)