LeNet 5, AlexNet, VGG -16 from Deeplearning.ai

Shahariar Rabby
4 min readNov 2, 2017

--

Why look at case studies?

This course Andrew will show you some case study of convolutional neural networks. Often we saw that a neural network architecture that works well one vision task may also work well in other computer vision task. After this week you will feel comfortable to read some of the research paper of computer vision and see yourself able to understand them. He will talk about 5 Neural net architecture:

  • LeNet-5
  • AlexNet
  • VGG
  • Resnet
  • Inception

Classic Networks

LeNet-5 Start with an image of 32 x 32 x 1 and the goal was to recognize handwritten digit. In the first step, we use six 5 x 5 filter with stride 1 and get 28 x 28 x 6. With a stride of 1 and no padding, we reduce the dimension to 32 x 32 to 28 x 28. Then we use an average pooling with a filter width of 2 and stride of 2 and reduce the dimension by a factor of 2 and end up with 14 x 14 x 6. Then we use another convolutional layer with sixteen 5 x 5 filter and end up with 10 x 10 x 16. That time people use valid padding that’s why each time height and weight shrinks. Then another pooling layer and end up with 5 x 5 x 16. Then the next layer is a fully-connected layer with 120 nodes. The previous layers 400 (5*5*16) then connected with these 120 neurons. Then another layer this 120 nodes connected with an 84 node and use this to connect with Yhat with possible 10 values that will recognize digit from 0 to 9. But in the modern version of this neural net, we use softmax function with a ten wave classification output.

This lenet-5 has only 60k parameters. But today we use anywhere from 100m to 100m parameters.

From this network, we see that, as we go deeper (left to right) the height and width tend to go down and the number of channels increased.

Conv → poo → Conv → pool → fully connected → fully connected → output This types of layer quite common nowadays.

AlexNet inputs start with 227 x 227 x 3 images. Then the first layers it applies 96 11 by 11 filter with the stride of 4. Because it’s using a stride of 4 the dimension reduces to 55 by 55. Then apply max polling with 3 by 3 filter and stride of 2 and the volume reduces to 27 by 27 by 96. Then it applies 5 by SAME padding convolution and up with 27 by 27 by 276. Then again a max-pooling with 3 by 3 filter and stride of 2 and end up with 13 x 13 x 256. Then another 3 x 3 same padding convolution and up with 13 x 13 x 384. Then 3 by 3 same convolution again and end up with 13 x 13 x 384. Then same convolution, a max pool and end up with 6 x 6 x 256, which is 9216 parameter. Then we gonna connect it with a 4096 fully connected node. Then another fully connected layer and use a softmax function in 1000 classes output.

The Alexnet has 60M parameter compare to LeNet’s 60k parameters. And it uses ReLU activation function.

VGG -16 : The remarkable thing about VGG-16 net is that they said, “Instead having so many hyperparameters, let’s use a much simpler network where you focus on just having Conv layers that are just 3 by 3 filters with stride 1 and always use the SAME padding, and make all your max polling layers 2 by 2 with a stride of 2”.

One varies cool thing about VGG is that it’s really simplified these neural network architectures.

VGG 16 model

It is a really deep network. But you can easily understand the architect from the picture below. This network has a total of about 138 million parameters.

--

--

Shahariar Rabby

Like to make the conceptual practical to computer program. o.O