Want better accuracy for classification models? Let’s use “Attention maps”!

Published in

Moonvision

2 min readAug 11, 2019

One of the major concerns of research groups and companies dealing with classification tasks in computer vision is to tackle the scenario where images belonging to different classes look very similar. As a result, the classification convolutional neural network architecture does not work very well with such datasets.

To deal with such a common yet crucial issue in supervised learning, MoonVision proposes a solution, using few-shot classification with attention mechanism, which helps the convolutional neural network architecture learn fine-grained local details of an image. As attention maps focus on the fine-grained details of an image, it helps the convolution neural architecture distinguish between the images which look very similar but belong to different categories. This potentially increases the accuracy of deep learning architectures.

Qualitative results for Cars196 dataset by using our few-shot classification with an attention mechanism.

Let’s see the magic of “Attention maps” for better Interpretability

Here we have shown where our model incorporates attention during training for CUB200 and CARS196 dataset compared to other methods that just visualize activations for different layers of the model.

Indeed, local features play a critical role in many fine-grained visual recognition tasks. Typical deep neural networks designed for image classification are good at extracting high-level global features, but the features of local details are often missing. This could limit the Attention maps in exploring local details to distinguish the difference between images from different classes. For example, without local details, the deep learning models could not learn about fine-grained details of an image that could not force a model to focus on most discriminative parts such as logos, lights, etc.. in identifying different cars.

Conclusion

Our few shot classifications with “Attention mechanism” is the state of the art image classifier which extracts high-level global features as well as low-level local features to get better results.

Check out what we do at https://www.moonvision.io/ and check our platform at https://app.moonvision.io/signup.

Want better accuracy for classification models? Let’s use “Attention maps”!

Let’s see the magic of “Attention maps” for better Interpretability

Conclusion

Written by Ankit Kumar