PyTorch: Directly use pre-trained AlexNet for Image Classification and Visualization of the activation maps

Published in

Analytics Vidhya

5 min readSep 2, 2020

Image of tiger credit: https://phys.org/news/2019-11-indian-authorities-exaggerated-tiger.html (CC0 Public Domain)

Hello everyone. Today I would like to introduce the most classic Convolutional Neural Network (CNN), AlexNet [1], the first Data-driven CNN-based (deep learning-based) method which won the ImageNet Object Classification 2012. In this blog, you will learn:

Directly use a pre-trained AlexNet for Class Prediction (The original AlexNet is able to classify 1,000 classes such as tiger, bicycle, shark, etc.)
Visualize what features are selected among the AlexNet for classification (we will see the feature selection at each layer and know what features are passed to the next layer)

source code for this blog: https://gitlab.com/ronctli1012/blog1-pretrained-alexnet-and-visualization

Preparation

You should have basic knowledge about CNN (e.g. you heard about AlexNet before and know a bit about its structure)
Anaconda (anaconda.com): 1.) we usually use Anaconda to create an environment for development; 2.) all the required packages have been written in “requirement.txt” and/or you can use the provided “torch_gpu.yml” to directly create the environment for this blog. See our source code for details
Pytorch (pytorch.org): 1.) Pytorch is one of the commonly used frameworks (p.s. we also have TensorFlow, Keras, etc.) for implementing CNN; 2.) If you do not have a GPU, you can also follow this blog by installing the Pytorch CPU version

Pre-processing the Input

data_transforms : use for pre-process the input before feeding it into the pre-trained AlexNet.

transforms.Resize((224,224)) : the input is resized to 224x224
transforms.ToTensor() : put the input to Tensor format in order to use torch
transforms.Normalize([mean], [standard deviation]) : normalize the input using the [mean] and [standard deviation]

Read and Transform Input

opt.test_img is the input parameter which indicates the file name of the testing image. Note that the testing image should be stored in “alexnet_images” folder.

Define Network

After pre-processing the input, we have to define our model. You can see that we just need one line of code to get the pre-trained AlexNet. As we just do the testing in this blog, we can directly move the model to the evaluation mode (i.e. alexnet.eval()). Then, we can feed the pre-processed input to the model and get the predicted result.

visualize_activation_maps(batch_img, alexnet) is a function to visualize the feature selection at each layer inside the AlexNet. As there are 5 layers inside the AlexNet, there will be 5 images generated by this function and the images will be stored in your current working directory.

Understanding the Prediction Results

As I mentioned in the very beginning, the original AlexNet can classify 1,000-class objects. Therefore, we first match the classes to their corresponding labels and we try to display the first 5 class labels.

Note that the output of the AlexNet should be a vector with length of 1,000. First, we will sort the output according to the value of each element in the output vector. Then, we use a softmax function to normalize this 1000-length vector to a probability vector. Each element in this probability vector represents the class probability (i.e. how likely the input belongs to the class).

Finally, we display the first 5 classes with the highest probability.

Directly Use

Sample: the output of running the “alexnet_main.py” script using the “tiger.jpg” input image

Again! All the material can be found at: https://gitlab.com/ronctli1012/blog1-pretrained-alexnet-and-visualization

For Window User, you can simply run the script by typing the following one line of code in the command window (i.e. cmd):

You can see that the predicted results (91.6405% belongs to class “tiger, Panthera tigris”) and visualize the features passed among the AlexNet.

There should be 5 images in your current working directory. Red indicates the most important features at that layer.

From left to right and top to bottom: the most important features at the 1st layer, the 2nd layer, 3rd layer, 4th layer and 5th layer

Obviously, simple edge features are highly activated (i.e. more important) at the early layers like layer 1. At the 5th layer, you can see that the head of the tiger is highlighted. This means that the model (or AlexNet) thinks that this is an important feature to classify this object as a tiger.

Hope that next time we can deeply discuss about the feature selection inside a model. Feature representation is a very important topic in today’s development of deep learning.

Now, you can try to download some images from the Internet and save them inside the “alexnet_images” folder. Then, you can type the following command in your command window,

Note that xxx.jpg is the file name of your image. Let’s try to see what predicted results you will obtain!:)

References

[1] https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

This is my first time to write blog to share what I have learnt. Why I would like to do this? It is because I would like to change something. Hope that writing can change my mind and I can be myself in the coming future.

If you like, please leave comments here and tell me what you think!:) Thanks for your attention and hope you enjoy this piece of sharing. See you later!