VGG-16 :Convolution Neural Network
The full form of VGG is Visual Geometry Group. The main purpose to understand how the depth of cnn affects the accuracy of the model . Number of parameters are reduced using 3*3 kernal size for all layers. If you want to read more about CNN and its architectures https://medium.com/@ommore524 .
The input RGB image input to the VGG network. If we talk about AlexNet architecture There was kernal size 11*11 ,5*5 and 3*3 . That means in vgg-16 instead of using 11*11 and 5*5 they applied 3*3 kernals . Vgg-16 contains 13 convolution layers and 3 fully connected layers.
Using multiple convolution layers with smaller convolution kernels instead of a larger convolution layer with convolution kernels can reduce parameters on the one hand, and the author believes that it is equivalent to more non-linear mapping, which increases the Fit expression ability.
Every step they applied kernals for 2–3 time and then applied a max pooling layer . Each time the number of kernals are doubles from the previous layer. That means each time we trying to extract the more and more fatures .At the end the of the network three fully connected layers are used. To limit the relu activation function grow as long as training happens the batch normalisation is introduced . Dropout is also implemented for reduce overfitting of model.
Summary:
- A smaller 3 * 3 convolution kernel and a deeper network are used . The stack of two 3 * 3 convolution kernels is relative to the field of view of a 5 * 5 convolution kernel, and the stack of three 3 * 3 convolution kernels is equivalent to the field of view of a 7 * 7 convolution kernel. In this way, there can be fewer parameters (3 stacked 3 * 3 structures have only 7 * 7 structural parameters (3 * 3 * 3) / (7 * 7) = 55%); on the other hand, they have more The non-linear transformation increases the ability of CNN to learn features.
- In the convolutional structure of VGGNet sometimes , a 1 * 1 convolution kernel is introduced. Without affecting the input and output dimensions, non-linear transformation is introduced to increase the expressive power of the network and reduce the amount of calculation.