Common CNN Architectures In Depth
3 min readJul 26, 2024
Learn almost everything about the most common CNN architectures.
1. LeNet
- First CNN architecture
- Hello world of deep learning
- 60k parameters
- Applied on handwritten digit recognition task
- Activation function in hidden layer: sigmoid/tanh
- Activation function in output layer: softmax
- Pooling: average pooling
2. Alex Net
- The architecture that popularized CNN
- Similar to LeNet but more deeper
- Winner of imageNet large-scale visual recognition challenge (ILSVRC) in 2012
- To reduce overfitting authors used dropout 50% and data augmentation
- Activation function in hidden layer: Relu
- Activation function in output layer: softmax
- Pooling: Max pooling
3. ZFNet
4. GoogleNet
- Winner of ImageNet large-scale visual recognition challenge (ILSVRC) in 2014
- Their architecture consisted of 22-layers deep CNN but reduced the number of parameters from 60 million (AlexNet) to 4 million
- The network was much deeper than the previous CNN. This was made possible by subnetworks called inception modules.
- The local response normalization layer was used to ensure that the previous layers learned various features.
- Consist of nine inception modules
- Dropout 40%
- Activation function in hidden layer: Relu
- Activation function in output layer: Softmax
5. VGGNet
- The runner-up in the ILSVRC 2014 challenges
- Classicical architecture
- Much deeper
- Activation function in hidden layer: Relu
- Activation function in output layer: Softmax
6. ResNet
- Winner of ImageNet large-scale visual recognition challenge (ILSVRC) in 2015
- The key to being able to train such a deep network is to use skip connections (shortcut connections)
7. Inception Network
- If we do not know which size of filters; 1X1, 3X3, or 5X5 we can add all and use them together.
- We apply 1X1 conv to shrink the number of channel and it will help to reduce the computation cost.
8. 1x1 Convolution (Network in network)
- Reduce the number of depth channels
9. Mobile Net
- Low computational cost at deployment
- We use depth-wise separable convolution which is simpler than normal convolution