Member-only story
Intuition Builder: How to Wrap Your Mind Around Transformer’s Attention Mechanism
A Transformer Attention Mechanism Explainer for the Rest of Us
Table of Contents
· Motivation
· Fundamental Building Blocks: A Special Vector Called ‘Embedding’
· Core Mechanism: Dot Product of Vectors
· Let’s Apply it on Something Easier: Recommender System
· Now We Can Talk About Attention: How YouTube Finds The Video You Search (QKV system)
· Attention in Translation, It’s All Very Natural
· Self-Attention: Finds the More Sophisticated Self
· Conclusion
Motivation
If you work in the artificial intelligence industry or are studying to get into it, there is little chance that you haven’t heard about Transformer. Google introduced it with its signature paper Attention is All You Need, Vaswani et al. (2017). It very quickly gained popularity among researchers in NLP, and people re-implemented major NLP papers used to done with RNN/LSTM with the Transformer. Transformer-based pre-trained language models OpenAI’s GPT-3 and tools like Hugging Face quickly gained traction in industrial and business realms. But it…

