Concept of AlexNet:- Convolutional Neural Network

Abhijeet Pujara
Analytics Vidhya
Published in
5 min readNov 3, 2020


This article covers nine parts:

  1. What is AlexNet?
  2. The Architecture of AlexNet.
  3. Dataset.
  4. ReLU.
  5. Dropout.
  6. Pros of AlexNet.
  7. Cons of AlexNet.
  8. AlexNet With Python.
  9. Conclusion.

What is AlexNet?

AlexNet is the name given to a Convolutional Neural Network Architecture that won the LSVRC competition in 2012.

LSVRC (Large Scale Visual Recognition Challenge) is a competition where research teams evaluate their algorithms on a huge dataset of labeled images (ImageNet) and compete to achieve higher accuracy on several visual recognition tasks. This made a huge impact on how teams approach the completion afterward.

The Architecture of AlexNet

The AlexNet contains 8 layers with weights;

5 convolutional layers

3 fully connected layers.

At the end of each layer, ReLu activation is performed except for the last one, which outputs with a softmax with a distribution over the 1000 class labels. Dropout is applied in the first two fully connected layers. As the figure above shows also applies Max-pooling after the first, second, and fifth convolutional layers. The kernels of the second, fourth, and fifth convolutional layers are connected only to those kernel maps in the previous layer, which reside on the same GPU. The kernels of the third convolutional layer are connected to all kernel maps in the second layer. The neurons in the fully connected layers are connected to all neurons in the previous layer.


ImageNet is an image database of over 15 million labeled high-resolution images labeled to 22,000 categories. This competition uses a subset of ImageNet’s images and challenges researchers to achieve the lowest top-1 and top-5 error rates. The input to AlexNet is an RGB image of size 256×256. This means all images in the training set and all test images need to be of size 256×256. Meaning, it needs to be converted to 256×256 before using it for training the network.


An important feature of the AlexNet is the use of ReLU(Rectified Linear Unit) Nonlinearity.

Tanh or sigmoid activation functions used to be the usual way to train a neural network model.

AlexNet showed that using ReLU nonlinearity, deep CNNs could be trained much faster than using the saturating activation functions like tanh or sigmoid.

Tested on the CIFAR-10 dataset.

Let's see why it trains faster with the ReLUs. The ReLU function is given by

f(x) = max(0,x)

plots of the two functions —

1. tanh

2. ReLU.

image credits
image credits

The tanh function saturates at very high or very low values of z. In these regions, the slope of the function goes very close to zero. This can slow down gradient descent.

The ReLU function’s slope is not close to zero for higher positive values of z. This helps the optimization to converge faster. For negative values of z, the slope is still zero, but most of the neurons in a neural network usually end up having positive values.

ReLU wins over the sigmoid function, too, for the same reason.

The Overfitting Problem. AlexNet had 60 million parameters, a major issue in terms of overfitting.

Two methods to reduce overfitting:

  1. Data Augmentation
  2. Dropout.

Data Augmentation.

The authors generated image translations and horizontal reflections, which increased the training set by 2048. They also performed Principle Component Analysis (PCA) on the RGB pixel values to change RGB channels' intensities, which reduced the top-1 error rate by more than 1%.


The second technique that AlexNet used to avoid overfitting was a dropout. It consists of setting to zero the output of each hidden neuron with a probability of 0.5. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in backpropagation. So every time an input is presented, the neural network samples a different architecture. This technique consists of turning off neurons with a predetermined probability. This means that every iteration, the neurons “turned off” do not contribute to the forward pass and do not participate in backpropagation.

Pros of AlexNet

  1. AlexNet isconsideredasthemilestoneofCNNforimageclassification.
  2. Many methods, such as the conv+pooling design, dropout, GPU, parallel computing, ReLU, are still the industrial standard for computer vision.
  3. TheuniqueadvantageofAlexNet is the direct image input to the classification model.
  4. Theconvolutionlayerscanautomaticallyextracttheedgesoftheimages and fully connected layers learning these features
  5. Theoreticallythecomplexityofvisualpatternscanbeeffectiveextractedbyaddingmoreconvlayer

Cons of AlexNet

  1. AlexNet is NOT deep enough compared to the later model, such as VGGNet, GoogLENet, and ResNet.
  2. The use of large convolution filters (5*5) is not encouraged shortly after that.
  3. Use normal distribution to initiate the weights in the neural networks, can not effectively solve the problem of gradient vanishing, replaced by the Xavier method later.
  4. The performance is surpassed by more complex models such as GoogLENet (6.7%), and ResNet (3.6%)

AlexNet With Python


AlexNet is a work of supervised learning and got excellent results.

It was also important for selecting methods like dropout and data augmentation that helped the network's performance.

The AlexNet made revolutionary implementation on ConvNets that continues nowadays, such as ReLU and dropout.

It is not easy to have low classification errors without having overfitting.

For more clear understanding, please visit this video

Happy Learning !!!

Happy coding :)

And Don’t forget to clap clap clap…



Abhijeet Pujara
Analytics Vidhya

Data Science enthusiast. A flexible professional who enjoys learning new skills and quickly adapts to organizational changes.