Concept of AlexNet:- Convolutional Neural Network

Abhijeet Pujara
Nov 3, 2020 · 5 min read

This article covers nine parts:

  1. What is AlexNet?
  2. The Architecture of AlexNet.
  3. Dataset.
  4. ReLU.
  5. Dropout.
  6. Pros of AlexNet.
  7. Cons of AlexNet.
  8. AlexNet With Python.
  9. Conclusion.

What is AlexNet?

LSVRC (Large Scale Visual Recognition Challenge) is a competition where research teams evaluate their algorithms on a huge dataset of labeled images (ImageNet) and compete to achieve higher accuracy on several visual recognition tasks. This made a huge impact on how teams approach the completion afterward.

The Architecture of AlexNet

The AlexNet contains 8 layers with weights;

5 convolutional layers

3 fully connected layers.

At the end of each layer, ReLu activation is performed except for the last one, which outputs with a softmax with a distribution over the 1000 class labels. Dropout is applied in the first two fully connected layers. As the figure above shows also applies Max-pooling after the first, second, and fifth convolutional layers. The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer, which reside on the same GPU. The kernels of the third convolutional layer are connected to all kernel maps in the second layer. The neurons in the fully connected layers are connected to all neurons in the previous layer.


ImageNet is an image database of over 15 million labeled high-resolution images labeled to 22,000 categories. This competition uses a subset of ImageNet’s images and challenges researchers to achieve the lowest top-1 and top-5 error rates. The input to AlexNet is an RGB image of size 256×256. This means all images in the training set and all test images need to be of size 256×256. Meaning, it needs to be converted to 256×256 before using it for training the network.


An important feature of the AlexNet is the use of ReLU(Rectified Linear Unit) Nonlinearity.

Tanh or sigmoid activation functions used to be the usual way to train a neural network model.

AlexNet showed that using ReLU nonlinearity, deep CNNs could be trained much faster than using the saturating activation functions like tanh or sigmoid.

Tested on the CIFAR-10 dataset.

Let's see why it trains faster with the ReLUs. The ReLU function is given by

f(x) = max(0,x)

plots of the two functions —

1. tanh

2. ReLU.

image credits
image credits

The tanh function saturates at very high or very low values of z. In these regions, the slope of the function goes very close to zero. This can slow down gradient descent.

The ReLU function’s slope is not close to zero for higher positive values of z. This helps the optimization to converge faster. For negative values of z, the slope is still zero, but most of the neurons in a neural network usually end up having positive values.

ReLU wins over the sigmoid function, too, for the same reason.

The Overfitting Problem. AlexNet had 60 million parameters, a major issue in terms of overfitting.

Two methods to reduce overfitting:

  1. Data Augmentation
  2. Dropout.

Data Augmentation.


Pros of AlexNet

  1. Many methods, such as the conv+pooling design, dropout, GPU, parallel computing, ReLU, are still the industrial standard for computer vision.
  2. TheuniqueadvantageofAlexNet is the direct image input to the classification model.
  3. Theconvolutionlayerscanautomaticallyextracttheedgesoftheimages and fully connected layers learning these features
  4. Theoreticallythecomplexityofvisualpatternscanbeeffectiveextractedbyaddingmoreconvlayer

Cons of AlexNet

  1. The use of large convolution filters (5*5) is not encouraged shortly after that.
  2. Use normal distribution to initiate the weights in the neural networks, can not effectively solve the problem of gradient vanishing, replaced by the Xavier method later.
  3. The performance is surpassed by more complex models such as GoogLENet (6.7%), and ResNet (3.6%)

AlexNet With Python


It was also important for selecting methods like dropout and data augmentation that helped the network's performance.

The AlexNet made revolutionary implementation on ConvNets that continues nowadays, such as ReLU and dropout.

It is not easy to have low classification errors without having overfitting.

For more clear understanding, please visit this video

Happy Learning !!!

Happy coding :)

And Don’t forget to clap clap clap…

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…