‘MrsFormer’ Employs a Novel Multiresolution-Head Attention Mechanism to Cut Transformers’ Compute and Memory Costs
mprovements since their introduction in 2017 and are now the standard in the natural language processing and computer vision research fields. The wider real-world application of transformers is however limited by the massive computation and memory demands of their self-attention mechanisms when…