CNN Architectures — VGGNet

Small filters, deeper network.

Gary(Chang, Chih-Chun)
Deep Learning#g
2 min readJun 4, 2018

--

https://arxiv.org/pdf/1409.1556.pdf

Visual Geometry Group in University of Oxford presented a deeper ConvNet with 19 layers and won the ILSVRC 2014 competition with only 7.3% top-5 error rate on test set.

Highlights of the Paper

  • ILSVRC’14 2nd in classification, 1st in localization
  • Similar training procedure as Krizhevsky 2012(AlexNet)
  • No Local Response Normalisation
  • Use ensembles for best results
  • FC7 features generalize well to other tasks

Overall Architecture

There are the details of layers in VGG16.

There are 138M parameters in VGG16. As for memory usage, there are 24M * 4 bytes, which is almost equal to 96MB per image and this only contains the forward path.(~96MB*2 for backward) The memory usage can be calculated by adding up each layer’s output size.

Stacks of 3x3 conv layers with stride 1 have the same receptive field as one 7x7 conv layer while more conv layers(deeper) create more non-linearities. Besides, three 3x3 conv layers contain 3x(3x3xC) parameters which is less than a 7x7 conv layer with 1x(7x7XC) parameters.

Three 3x3 filters with stride 1 have the same reception field as one 7x7 filter

Training

  1. Batch size 256
  2. SGD Momentum 0.9
  3. Dropout 0.5
  4. Learning rate 0.01, reduced by 10 manually when val accuracy plateaus (the learning rate was decreased 3 times, and the learning was stopped after 370K iterations (74 epochs))
  5. L2 weight decay 5e-4
  6. Data augmentation is similar to Krizhevsky 2012(AlexNet)

Result

Implementation

I built the VGG net in tensorflow.

Some words extracted from the source paper, and from Stanford cs231n lecture slides.

If you like this article and consider it useful for you, please support it with 👏.

If you have any question, please feel free to let me know!

--

--