Warsaw U, Google & OpenAI’s Terraformer Achieves a 37x Speedup Over Dense Baselines on 17B Transformer Decoding

Synced
Synced
Dec 3, 2021 · 4 min read

While large-scale transformer architectures have significantly advanced the state-of-the-art on most natural language processing (NLP) tasks, in many real-life applications these models can be prohibitively expensive to train and their decoding speeds undesirably sluggish.