DLOA (Part-17)-Implementation of GoogLeNet CNN

7 min readMay 16, 2023

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed GoogLeNet CNN. If you didn’t read that you can go through this link. In this blog, we’ll be discussing the Implementation of GoogLeNet CNN.

Recap

This architecture first processes the input image by a series of convolutional and pooling layers to extract high-level features. The output of this initial processing is then fed into the Inception modules, which consist of multiple branches of convolutional and pooling layers of varying filter sizes. The outputs of these branches are then concatenated and fed into the next layer.

The Inception module allows for a trade-off between the depth and width of the network, which can significantly reduce the number of parameters required to achieve high accuracy.

Additionally, GoogleNet uses auxiliary classifiers at intermediate layers to encourage learning discriminative features and prevent overfitting.

Implementation

Import the necessary libraries — TensorFlow, NumPy, and Keras.

import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Dropout, Dense, Flatten, concatenate, AveragePooling2D
from tensorflow.keras.models import Model

Define the inception module for GoogLeNet. It consists of multiple parallel branches with different kernel sizes.

def inception_module(x, f1, f3, f5, mp):
    # 1x1 convolution branch
    conv1 = Conv2D(f1, (1, 1), padding='same', activation='relu')(x)
    # 3x3 convolution branch
    conv3 = Conv2D(f3, (1, 1), padding='same', activation='relu')(x)
    conv3 = Conv2D(f3, (3, 3), padding='same', activation='relu')(conv3)
    # 5x5 convolution branch
    conv5 = Conv2D(f5, (1, 1), padding='same', activation='relu')(x)
    conv5 = Conv2D(f5, (5, 5), padding='same', activation='relu')(conv5)
    # max pooling branch
    pool = MaxPooling2D((3, 3), strides=(1, 1), padding='same')(x)
    pool = Conv2D(mp, (1, 1), padding='same', activation='relu')(pool)
    # concatenate the outputs of all branches
    output = concatenate([conv1, conv3, conv5, pool], axis=-1)
    return output

Define the input shape of the image and the number of classes.

input_shape = (224, 224, 3)
num_classes = 1000

Define the input layer of the model.

input_layer = Input(shape=input_shape)

Define the first convolutional layer with 64 filters and a kernel size of 7x7, with a stride of 2 and padding.

x = Conv2D(64, (7, 7), strides=(2, 2), padding='same', activation='relu')(input_layer)

Define the first max pooling layer with a pool size of 3x3 and a stride of 2.

x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)

Define the second convolutional layer with 64 filters and a kernel size of 1x1, with padding.

x = Conv2D(64, (1, 1), padding='same', activation='relu')(x)

Define the third convolutional layer with 192 filters and a kernel size of 3x3, with padding.

x = Conv2D(192, (3, 3), padding='same', activation='relu')(x)

Define the second max pooling layer with a pool size of 3x3 and a stride of 2.

x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)

Apply the first inception module with 64 filters, 96 filters, 128 filters, and 16 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively.

x = inception_module(x, f1=64, f3=96, f5=128, mp=32)

Apply the second inception module with 128 filters, 128 filters, 192 filters, and 32 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively.

x = inception_module(x, f1=128, f3=128, f5=192, mp=64)

Apply a max pooling layer with a pool size of 3x3 and a stride of 2.

x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)

Apply the third inception module with 192 filters, 96 filters, 208 filters, and 16 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively.

x = inception_module(x, f1=192, f3=96, f5=208, mp=64)

Apply the fourth inception module with 160 filters, 112 filters, 224 filters, and 24 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively.

x = inception_module(x, f1=160, f3=112, f5=224, mp=64)

Apply the fifth inception module with 128 filters, 128 filters, 256 filters, and 24 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively.

x = inception_module(x, f1=128, f3=128, f5=256, mp=64)

Apply a max pooling layer with a pool size of 3x3 and a stride of 2.

x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)

Apply the sixth inception module with 112 filters, 144 filters, 288 filters, and 32 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively.

x = inception_module(x, f1=112, f3=144, f5=288, mp=64)

Apply the seventh inception module with 256 filters, 160 filters, 320 filters, and 32 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively.

x = inception_module(x, f1=256, f3=160, f5=320, mp=128)

Apply a global average pooling layer to compute the average of the entire feature map.

x = AveragePooling2D((7, 7))(x)

Flatten the output of the previous layer.

x = Flatten()(x)

Apply a dropout layer with a rate of 0.4 to prevent overfitting.

x = Dropout(0.4)(x)

Apply a dense layer with 1000 units and a softmax activation function to produce the final output probabilities.

output_layer = Dense(num_classes, activation='softmax')(x)

Define the model with the input and output layers.

model = Model(input_layer, output_layer)

Compile the model with the categorical cross

model.compile(loss='categorical_crossentropy', optimizer='adam')

Print the summary of the model architecture

model.summary()

Complete Code:

Here’s the full code, including comments to explain each line:

from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, concatenate, AveragePooling2D, Flatten, Dropout, Dense
from tensorflow.keras.models import Model

def inception_module(x, f1, f3, f5, mp):
    """
    Creates an inception module consisting of 1x1, 3x3, and 5x5 convolutions, as well as a max pooling layer.
    """
    # 1x1 convolution branch
    conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(x)

    # 3x3 convolution branch
    conv3 = Conv2D(f3, (1,1), padding='same', activation='relu')(x)
    conv3 = Conv2D(f3, (3,3), padding='same', activation='relu')(conv3)

    # 5x5 convolution branch
    conv5 = Conv2D(f5, (1,1), padding='same', activation='relu')(x)
    conv5 = Conv2D(f5, (5,5), padding='same', activation='relu')(conv5)

    # max pooling branch
    pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(x)
    pool = Conv2D(mp, (1,1), padding='same', activation='relu')(pool)

    # concatenate the output of each branch
    output = concatenate([conv1, conv3, conv5, pool], axis=3)

    return output

# define the number of classes in the output layer
num_classes = 1000

# define the input layer
input_layer = Input(shape=(224,224,3))

# apply the first convolution layer with 64 filters and a kernel size of 7x7
x = Conv2D(64, (7,7), strides=(2,2), padding='same', activation='relu')(input_layer)

# apply a max pooling layer with a pool size of 3x3 and a stride of 2
x = MaxPooling2D((3,3), strides=(2,2), padding='same')(x)

# apply the first inception module with 64 filters, 96 filters, 128 filters, and 32 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively
x = inception_module(x, f1=64, f3=96, f5=128, mp=32)

# apply the second inception module with 128 filters, 128 filters, 192 filters, and 32 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively
x = inception_module(x, f1=128, f3=128, f5=192, mp=64)

# apply a max pooling layer with a pool size of 3x3 and a stride of 2
x = MaxPooling2D((3,3), strides=(2,2), padding='same')(x)

# apply the third inception module with 192 filters, 96 filters, 208 filters, and 16 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively
x = inception_module(x, f1=192, f3=96, f5=208, mp=16)

# apply the fourth inception module with 160 filters, 112 filters, 224 filters, and 24 filters for the 1x1,
x = inception_module(x, f1=160, f3=112, f5=224, mp=24)

# apply the fifth inception module with 128 filters, 128 filters, 256 filters, and 24 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively
x = inception_module(x, f1=128, f3=128, f5=256, mp=24)

# apply a max pooling layer with a pool size of 3x3 and a stride of 2
x = MaxPooling2D((3,3), strides=(2,2), padding='same')(x)

# apply the sixth inception module with 256 filters, 160 filters, 320 filters, and 32 filters for the 1x1, 3x3, 5x5, and max pooling branches, respectively
x = inception_module(x, f1=256, f3=160, f5=320, mp=32)

# apply a global average pooling layer to reduce the spatial dimensions of the output
x = AveragePooling2D((7,7), strides=(1,1), padding='valid')(x)

# flatten the output to a 1D array
x = Flatten()(x)

# apply a dropout layer to reduce overfitting
x = Dropout(0.4)(x)

# apply a fully connected layer with 1000 neurons and softmax activation
output_layer = Dense(num_classes, activation='softmax')(x)

# define the model using the input and output layers
model = Model(inputs=input_layer, outputs=output_layer)

# compile the model with categorical cross-entropy loss and Adam optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam')

# print the summary of the model architecture
model.summary()

The code defines a convolutional neural network (CNN) model using the Inception architecture, specifically the Inception V1 model. Here’s a summary of the layers used:

Input layer: Accepts input images with a shape of (224, 224, 3).
Conv2D layer: Applies a convolutional filter with 64 filters and a kernel size of 7x7 to the input image.
MaxPooling2D layer: Reduces the spatial dimensions of the output from the previous layer by applying max pooling with a pool size of 3x3 and a stride of 2.
Inception modules: The model uses 6 inception modules, which are made up of 1x1, 3x3, and 5x5 convolutional branches, as well as a max pooling branch. The number of filters for each branch varies for each module.
MaxPooling2D layer: Reduces the spatial dimensions of the output from the previous layer by applying max pooling with a pool size of 3x3 and a stride of 2.
AveragePooling2D layer: Reduces the spatial dimensions of the output from the previous layer by applying average pooling with a pool size of 7x7 and a stride of 1.
Flatten layer: Flattens the output from the previous layer to a 1D array.
Dropout layer: Applies dropout regularization to the output from the previous layer to reduce overfitting.
Dense layer: Applies a fully connected layer with 1000 neurons and softmax activation to the output from the previous layer to produce the output predictions.

That’s it for now….I hope you liked my blog and implemented the GoogleNet code with me.

In the next blog, I will be discussing the ResNet CNN.

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

***Next Blog***

DLOA (Part-17)-Implementation of GoogLeNet CNN

Recap

Implementation

Complete Code:

Written by Dewansh Singh