Attention Is Not All You Need: Google & EPFL Study Reveals Huge Inductive Biases in Self-Attention Architectures

Published in

SyncedReview

4 min readMar 10, 2021

The 2017 paper Attention is All You Need introduced transformer architectures based on attention mechanisms, marking one of the biggest machine learning (ML) breakthroughs ever. A recent study proposes a new way to study self-attention, its biases, and the problem of rank collapse.

Attention-based architectures have proven effective for improving ML applications in natural language processing (NLP), speech recognition, and most recently in computer vision. Research aimed at understanding the inner workings of transformers and attention in general, however, has been limited.

In the paper Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth, a research team from Google and EPFL (École polytechnique fédérale de Lausanne) proposes a novel approach that sheds light on the operation and inductive biases of self-attention networks (SANs) and finds that pure attention decays in rank doubly exponentially with respect to depth.

The researchers summarize their work as follows:

Attention Is Not All You Need: Google & EPFL Study Reveals Huge Inductive Biases in Self-Attention Architectures

Written by Synced