New Study Suggests Self-Attention Layers Could Replace Convolutional Layers on Vision Tasks

Synced
SyncedReview
Published in
3 min readJan 28, 2020

Nowhere has AI experienced greater development or breakthroughs in recent years than in the field of natural language processing (NLP) — and “transformers” are the not-so-secret new technology behind this revolution. The key difference between transformers and traditional methods such as recurrent neural networks or convolutional neural networks is that transformers can simultaneously attend to every word of an input text. Transformers’ impressive performance across a wide range of NLP tasks is enabled by a novel attention mechanism which captures meaningful inter-dependencies between words in a sequence by calculating both positional and content-based attention scores.

Inspired by the performance of attention mechanisms in NLP, researchers have explored the possibility of applying them to vision tasks. Google Brain Team researcher Prajit Ramachandran proposed that self-attention layers could completely replace convolutional layers on vision tasks as well as achieve state-of-the-art performance. To confirm this theory, researchers from Ecole Polytechnique Federale de Lausanne (EPFL) put forth theoretical and empirical evidence which indicates that self-attention layers can indeed achieve the same performance as convolutional layers.

From a theoretical perspective, the researchers used constructive proof to show that a multi-head self-attention layer can simulate any convolutional layer.

The researchers set the parameters of a multi-head self-attention layer so that it could act like a convolutional layer and conducted a series of experiments to validate the applicability of the proposed theoretical construction, comparing a fully attentional model comprising six multi-head self-attention layers with a standard ResNet18 on the CIFAR-10 dataset.

Test accuracy on CIFAR-10

In the tests the self-attention models performed reasonably well except in learned embeddings with content-based attention — this mainly due the increased number of parameters. The researchers however confirmed that with theoretical and empirical support any convolutional layer can be expressed by self-attention layers and the fully-attentional models can learn to combine local behavior and global attention based on input content.

The paper On the Relationship Between Self-Attention and Convolutional Layer is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

Thinking of contributing to Synced Review? Sharing My Research welcomes scholars to share their own research breakthroughs with global AI enthusiasts.

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

--

--

Synced
SyncedReview

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global