The Techlife
Published in

The Techlife

NVIDIA’s Pay Attention When Required — A Transformer optimization approach

Optimizing self-attention and feed-forward transformer blocks to achieve higher performance with lower compute time

Photo by Christian Wiediger on Unsplash

The PAR Transformer needs 35% lower compute time than Transformer-XL achieved by replacing 63% of the self attention blocks with feed-forward

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mostafa Ibrahim

Mostafa Ibrahim

Software Eng. University College London Computer Science Graduate. Passionate about Machine Learning in Healthcare. Top writer in AI