DLOA (Part-16)-GoogLeNet CNN

Dewansh Singh
Learn AI With Me
Published in
7 min readMay 16, 2023

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed VGGNet CNN and implemented the Python code. If you didn’t read that you can go through this link. In this blog, we’ll be discussing GoogLeNet CNN, working, and key features.

Introduction

GoogleNet, also known as Inception-v1, is a convolutional neural network architecture developed by Google for image classification tasks. It was first introduced in 2014, and it won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in the same year with a top-5 error rate of 6.67%, which was a significant improvement over the previous year’s winner.

The main contribution of GoogleNet was the development of the Inception module, which allows for the efficient use of network depth and width, leading to improved accuracy with fewer parameters and reduced computational cost.

GoogleNet CNN Architecture

This architecture first processes the input image by a series of convolutional and pooling layers to extract high-level features. The output of this initial processing is then fed into the Inception modules, which consist of multiple branches of convolutional and pooling layers of varying filter sizes. The outputs of these branches are then concatenated and fed into the next layer.

The Inception module allows for a trade-off between the depth and width of the network, which can significantly reduce the number of parameters required to achieve high accuracy. Additionally, GoogleNet uses auxiliary classifiers at intermediate layers to encourage learning discriminative features and prevent overfitting.

Architecture and Working of GoogleNet CNN

Let’s take a closer look at the architecture of GoogleNet and how it works.

The GoogleNet architecture consists of 22 layers, including convolutional layers, pooling layers, Inception modules, fully connected layers, and softmax layers. The input to the network is a 224x224x3 RGB image, and the output is a probability distribution over 1000 classes.

The following is a brief overview of the GoogleNet architecture:

  1. Input layer: The input to the network is a 224x224x3 RGB image.
  2. Convolutional layer: The first layer is convolutional with 64 filters of size 7x7 and a stride of 2. This layer performs local feature extraction on the input image.
  3. Max pooling layer: The output of the first convolutional layer is passed through a max pooling layer with a filter size of 3x3 and a stride of 2. This layer reduces the spatial dimensions of the output and provides translation invariance.
  4. Local response normalization layer: The output of the max pooling layer is passed through a local response normalization (LRN) layer, which normalizes the response of neighboring neurons to prevent overfitting.
  5. Convolutional layer: The output of the LRN layer is passed through another convolutional layer with 64 filters of size 1x1. This layer performs dimensionality reduction on the output of the previous layer.
  6. Convolutional layer: The output of the previous convolutional layer is passed through another convolutional layer with 192 filters of size 3x3.
  7. Max pooling layer: The output of the second convolutional layer is passed through another max pooling layer with a filter size of 3x3 and a stride of 2.
  8. Inception modules: The output of the second max pooling layer is passed through a series of nine Inception modules. Each Inception module consists of multiple branches of convolutional and pooling layers of varying filter sizes, which are concatenated and passed on to the next layer. The Inception module allows for efficient use of network depth and width.
  9. Auxiliary classifiers: Three of the intermediate layers of the network include auxiliary classifiers that encourage the learning of discriminative features and prevent overfitting. Each auxiliary classifier consists of a global average pooling layer, a fully connected layer, a softmax layer, and a cross-entropy loss layer.
  10. Average pooling layer: The output of the last Inception module is passed through an average pooling layer with a filter size of 7x7 and a stride of 1. This layer computes the average value of each feature map, resulting in a single scalar value for each feature map.
  11. Dropout layer: The output of the average pooling layer is passed through a dropout layer, which randomly drops out some of the neurons to prevent overfitting.
  12. Fully connected layer: The output of the dropout layer is flattened and passed through a fully connected layer with 1000 neurons, one for each class in the dataset.
  13. Softmax layer: The output of the fully connected layer is passed through a softmax layer, which computes a probability distribution over the 1000 classes.
  14. Output layer: The final layer is the output layer, which outputs the predicted class label based on the highest probability in the softmax layer.

There are 22 Parameterized Layers in the Google Net architecture; these are Convolutional Layers and Fully-Connected Layers; if we include the non-parameterized layers like Max-Pooling, there are a total of 27 layers in the GoogleNet Model.

In the above architecture as stated in diagram, every box represents a layer,

  • Blue Box — Convolutional Layer
  • Green Box — Feature Concatenation
  • Red Box — MaxPool Layer
  • Yellow Box — Softmax Layer
GoogLeNet Layer Description

Input — The GoogLeNet model takes an input image of 224 x 224.

Output — The output layer (or the softmax layer) has 1000 nodes that correspond to 1000 different classes of objects.

Key Features of GoogLeNet CNN

Inception Module

The idea of the Inception module is to bring down the number of parameters in a Deep Neural Network; the inception module is built with many small convolutions.

The models like AlexNet have 60 Million parameters, whereas GoogleNet had only 4 Million parameters also; the architecture of GoogleNet was much deeper than AlexNet.

GoogleNet model has nine identical units known as inception units or inception modules.

Inception Unit

There are Convolutional Kernels of different shapes 1x1, 3x3, & 5x5 in the Inception Module. The output of all these kernels is stacked together at the end of the inception unit. Larger convolutional kernels cover a large area in an image to get the information, and the smaller kernels work on a smaller area in the image.

So, a 1x1 convolutional kernel will give us a finer detail in the image when compared with a 5x5 convolutional kernel.

Auxiliary Classifier

Auxiliary Classifier is a classification unit added twice in the middle of the GoogleNet network. We use Auxiliary Classifiers to tackle the problem of vanishing gradient.

Auxiliary Classifier

Each auxiliary classifier has a 5x5 average pooling layer, a 1x1 convolutional layer, two fully connected layers with 1024 units, and a softmax layer with 1000 units.

The auxiliary classifier units are added on top of the middle inception units. When we train the neural network, the loss of the auxiliary classifier units is multiplied by 0.3 and then added to the total network loss.

Advantages of GoogleNet Model

In today’s world, we use GoogleNet for various Computer Vision applications, including Object Detection, Image Classification, etc.

GoogleNet CNN Application Distribution

Around 18% of the GoogleNet applications are based on Image Classification, and about 10–10% are Object Detection and Quantization.

  • GoogleNet has proven to be faster when compared with other image-classification models like VGG.
  • GoogleNet is much more concise, the size of a pre-trained VGG16 model is 528 MB, and that of a VGG19 model is 549 MB, whereas the size of a pre-trained GoogleNet is 96 MB & InceptionV3 is 92 MB.
  • GoogleNet achieves higher efficiency by compressing the input image and simultaneously retaining the important features/information.

Conclusion

In conclusion, GoogleNet was a breakthrough in CNN architecture, especially with the introduction of the inception module, which allowed for more efficient and deeper network architecture.

The use of average pooling and 1x1 convolutions also helped to reduce the computational complexity of the network while maintaining high accuracy. GoogleNet achieved state-of-the-art performance on the ImageNet dataset, surpassing previous models while being more computationally efficient.

The ideas and techniques developed in GoogleNet have influenced the design of subsequent CNN architectures.

That’s it for now….I hope you liked my blog and got to know about GoogLeNet CNN, it’s working, and the key features of this CNN.

In the next blog, I will be discussing the Implementation of GoogleNet CNN as its implementation will take more words to complete.

So, we’ll take the implementation part in the next blog.

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

***Next Blog***

--

--

Dewansh Singh
Learn AI With Me

Software Engineer Intern @BirdVision | ML | Azure | Data Science | AWS | Ex-Data Science Intern @Celebal Technologies