Google, Cambridge U & Alan Turing Institute Propose PolyViT: A Universal Transformer for Image, Video, and Audio Classification

Synced
Synced
Nov 30, 2021 · 4 min read

The original 2017 transformer model was designed for natural language processing (NLP), where it achieved SOTA results. Its performance intrigued machine learning researchers, who have since successfully adapted the attention-based architecture to perception tasks in other…