LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling

In less than two years since their introduction, vision transformers (ViT) have revolutionized the computer vision field, leveraging transformer architectures’ powerful self-attention mechanisms to eliminate the need for convolutions and advance the state-of-the-art on image classification tasks. More…