Common CNN Architectures In Depth

Fraidoon Omarzai
3 min readJul 26, 2024

--

Learn almost everything about the most common CNN architectures.

1. LeNet

  • First CNN architecture
  • Hello world of deep learning
  • 60k parameters
  • Applied on handwritten digit recognition task
  • Activation function in hidden layer: sigmoid/tanh
  • Activation function in output layer: softmax
  • Pooling: average pooling

2. Alex Net

  • The architecture that popularized CNN
  • Similar to LeNet but more deeper
  • Winner of imageNet large-scale visual recognition challenge (ILSVRC) in 2012
  • To reduce overfitting authors used dropout 50% and data augmentation
  • Activation function in hidden layer: Relu
  • Activation function in output layer: softmax
  • Pooling: Max pooling

3. ZFNet

4. GoogleNet

  • Winner of ImageNet large-scale visual recognition challenge (ILSVRC) in 2014
  • Their architecture consisted of 22-layers deep CNN but reduced the number of parameters from 60 million (AlexNet) to 4 million
  • The network was much deeper than the previous CNN. This was made possible by subnetworks called inception modules.
  • The local response normalization layer was used to ensure that the previous layers learned various features.
  • Consist of nine inception modules
  • Dropout 40%
  • Activation function in hidden layer: Relu
  • Activation function in output layer: Softmax

5. VGGNet

  • The runner-up in the ILSVRC 2014 challenges
  • Classicical architecture
  • Much deeper
  • Activation function in hidden layer: Relu
  • Activation function in output layer: Softmax

6. ResNet

  • Winner of ImageNet large-scale visual recognition challenge (ILSVRC) in 2015
  • The key to being able to train such a deep network is to use skip connections (shortcut connections)

7. Inception Network

  • If we do not know which size of filters; 1X1, 3X3, or 5X5 we can add all and use them together.
  • We apply 1X1 conv to shrink the number of channel and it will help to reduce the computation cost.

8. 1x1 Convolution (Network in network)

  • Reduce the number of depth channels

9. Mobile Net

  • Low computational cost at deployment
  • We use depth-wise separable convolution which is simpler than normal convolution

10. Unet

11. EfficientNet

--

--

Fraidoon Omarzai

AI Enthusiast | Pursuing MSc in AI at Aston University, Birmingham