DLOA (Part-15)-VGGNet CNN and Implementation

Dewansh Singh
9 min readMay 12, 2023

--

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed AlexNet CNN and implemented the Python code. If you didn’t read that you can go through this link. In this blog, we’ll be discussing VGGNet CNN, working, and Implementation.

Introduction

VGGNet is a deep convolutional neural network (CNN) architecture that was proposed by a group of researchers at the Visual Geometry Group (VGG) at the University of Oxford in 2014. The VGGNet architecture has become a cornerstone of modern computer vision systems and has been used in a wide variety of applications including image classification, object detection, and segmentation. In this explanation, we will go over the details of the VGGNet architecture, its design philosophy, and its key innovations.

The main idea behind VGGNet is to use deep convolutional neural networks to learn a hierarchy of features that can be used to accurately classify images. The VGGNet architecture is characterized by its simplicity and uniformity, and is based on a series of convolutional layers, each followed by a pooling layer, and finally a few fully connected layers at the end.

VGGNet CNN Architecture

Architecture and Working of VGGNet CNN

The VGGNet architecture is named after the Visual Geometry Group (VGG) at the University of Oxford, where it was developed. The architecture is characterized by its depth and simplicity, and it has achieved state-of-the-art performance on several computer vision tasks, including image classification and object detection.

Let’s take a look at the VGGNet architecture using an example. In this example, we will use the VGG16 variant of the architecture, which consists of 16 layers, including 13 convolutional layers, 5 max-pooling layers, and 3 fully connected layers.

VGGNet 16 CNN Architecture
  • Input Image: The input to the VGGNet architecture is an RGB image of size 224x224x3 pixels.
  • Convolutional Layers: The first few layers of the network consist of convolutional layers. In the VGG16 variant of the architecture, the first two convolutional layers have 64 filters with a filter size of 3x3 and a stride of 1 pixel. The next two convolutional layers have 128 filters with the same filter size and stride. The fifth convolutional layer has 256 filters with a filter size of 3x3 and a stride of 1 pixel. The next two convolutional layers have 512 filters with the same filter size and stride. The final two convolutional layers have 512 filters with the same filter size and stride.
  • Max Pooling Layers: After every few convolutional layers, a max pooling layer is used to reduce the spatial dimensionality of the output feature maps. In the VGG16 variant of the architecture, the first two max pooling layers have a pool size of 2x2 with a stride of 2 pixels. The next three max-pooling layers have the same pool size and stride.
  • Fully Connected Layers: At the end of the network, a few fully connected layers are used to classify the input image. In the VGG16 variant of the architecture, there are three fully connected layers, with the first two layers having 4096 units and the last layer having as many units as the number of classes in the dataset.
  • Softmax Layer: The final layer of the network is a softmax layer, which takes the output of the last fully connected layer as input and produces a probability distribution over the classes. The class with the highest probability is chosen as the predicted class for the input image.
  • Training: The VGGNet architecture is trained using backpropagation with stochastic gradient descent (SGD) optimizer. The objective is to minimize the cross-entropy loss between the predicted class probabilities and the ground truth labels. Dropout regularization is used during training to prevent overfitting.
  • Inference: Once the network is trained, it can be used to classify new input images. The input image is passed through the network, and the class with the highest predicted probability is chosen as the output.

Overall, the VGGNet architecture is a powerful convolutional neural network that is capable of extracting hierarchical features from an input image using convolutional and max pooling layers, and using fully connected layers to classify the input image. The VGG16 variant of the architecture has achieved state-of-the-art performance on several computer vision tasks and is widely used in the research community.

Features of VGGNet 16 CNN

The VGGNet 16 model, also known as the OxfordNet, is a deep convolutional neural network that has achieved state-of-the-art results in image classification tasks. Here are some of its notable features:

  1. Architecture: VGGNet 16 has a total of 16 convolutional and fully connected layers. It follows a simple architecture of stacked convolutional layers with small 3x3 filters and max pooling layers. This architecture allows the network to learn complex features from the images.
  2. Preprocessing: The input images are preprocessed by resizing them to 224x224 and normalizing the pixel values to have zero mean and unit variance.
  3. Small filters: VGGNet 16 uses small 3x3 filters for convolutional layers throughout the network. This allows the network to learn spatially invariant features while keeping the number of parameters low.
  4. Max pooling: Max pooling is applied after each set of convolutional layers. This reduces the spatial dimensions of the feature maps and increases the receptive field of the network.
  5. Fully connected layers: VGGNet 16 has three fully connected layers with 4096 neurons each, followed by a softmax layer for classification. These layers allow the network to learn high-level features and perform classification on the learned features.
  6. Dropout: Dropout regularization is applied to the fully connected layers with a probability of 0.5. This prevents overfitting and improves the generalization of the network.
  7. Transfer learning: VGGNet 16 is often used as a pre-trained model for transfer learning. The model is trained on the large-scale ImageNet dataset, which contains over 1 million images with 1000 classes. The pre-trained model can be fine-tuned on smaller datasets or used as a feature extractor for downstream tasks.

Overall, the VGGNet 16 model has shown remarkable performance in image classification tasks, with an accuracy of over 90% on the ImageNet dataset. Its simple architecture and use of small filters have inspired many subsequent CNN architectures.

Implementation

In this section, I will be implementing the code of VGGNet16 CNN architecture which we have taken above as an example:

Section 1: Importing Required Libraries

In this section, we have imported all the required libraries for building the VGGNet model. We have used the Keras library, which is an open-source deep learning library that provides a user-friendly API for building and training deep learning models. We have also used the pre-trained VGG16 model, which is available in the Keras library.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
import matplotlib.pyplot as plt

Section 2: Loading the VGG16 Model

In this section, we have loaded the pre-trained VGG16 model, which is available in the Keras library. We have used the VGG16 class to load the model, which takes argument weights to specify the weight initialization scheme. Since we want to use the pre-trained weights, we have set weights to 'imagenet'.

# Define the VGGNet model
model = Sequential([
Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)),
Conv2D(64, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Conv2D(128, (3, 3), activation='relu', padding='same'),
Conv2D(128, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Conv2D(256, (3, 3), activation='relu', padding='same'),
Conv2D(256, (3, 3), activation='relu', padding='same'),
Conv2D(256, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Conv2D(512, (3, 3), activation='relu', padding='same'),
Conv2D(512, (3, 3), activation='relu', padding='same'),
Conv2D(512, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Conv2D(512, (3, 3), activation='relu', padding='same'),
Conv2D(512, (3, 3), activation='relu', padding='same'),
Conv2D(512, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Flatten(),
Dense(4096, activation='relu'),
Dense(4096, activation='relu'),
Dense(1000, activation='softmax')
])

Section 3: Loading and Preprocessing the Input Image

In this section, we have loaded the input image and preprocessed it to prepare it for feeding into the VGG16 model. We have used the load_img function from the keras.preprocessing.image module to load the image, and then used the img_to_array function to convert it into a NumPy array. We have also used the preprocess_input function to preprocess the image by subtracting the mean RGB values of the ImageNet dataset from each pixel.

# Load the pre-trained weights for the model
model.load_weights('vgg16_weights_tf_dim_ordering_tf_kernels.h5')

# Define the URL of the image to classify
url = 'https://images.unsplash.com/photo-1521747116042-5a810fda9665'

# Load the image from the URL and preprocess it for input to the model
img = image.load_img(image.load_img(url, target_size=(224, 224)), target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

Section 4: Making Predictions using the VGG16 Model

In this section, we have used the pre-trained VGG16 model to make predictions on the preprocessed input image. We have used the predict method of the model to get the predicted class probabilities for the input image. We have also used the decode_predictions function from the keras.applications.imagenet_utils module to decode the predicted class probabilities into class names.

# Make a prediction on the image using the VGG16 model
predictions = model.predict(x)

# Decode the top 3 predictions
decoded_predictions = decode_predictions(predictions, top=3)[0]

# Print the decoded predictions
print('Predictions:')
for pred in decoded_predictions:
print('{}: {:.2%}'.format(pred[1], pred[2]))

Section 5: Displaying the Results

In this section, we have displayed the results of the VGG16 model predictions. We have printed the top 5 predicted class names along with their probabilities. We have also displayed the input image using the matplotlib library.

# Display the image
plt.imshow(img)
plt.axis('off')
plt.show()

Complete Code:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
import matplotlib.pyplot as plt

# Define the VGGNet model
model = Sequential([
Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)),
Conv2D(64, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Conv2D(128, (3, 3), activation='relu', padding='same'),
Conv2D(128, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Conv2D(256, (3, 3), activation='relu', padding='same'),
Conv2D(256, (3, 3), activation='relu', padding='same'),
Conv2D(256, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Conv2D(512, (3, 3), activation='relu', padding='same'),
Conv2D(512, (3, 3), activation='relu', padding='same'),
Conv2D(512, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Conv2D(512, (3, 3), activation='relu', padding='same'),
Conv2D(512, (3, 3), activation='relu', padding='same'),
Conv2D(512, (3, 3), activation='relu', padding='same'),
MaxPooling2D((2, 2), strides=(2, 2)),
Flatten(),
Dense(4096, activation='relu'),
Dense(4096, activation='relu'),
Dense(1000, activation='softmax')
])

# Load the pre-trained weights for the model
model.load_weights('vgg16_weights_tf_dim_ordering_tf_kernels.h5')

# Define the URL of the image to classify
url = 'https://images.unsplash.com/photo-1521747116042-5a810fda9665'

# Load the image from the URL and preprocess it for input to the model
img = image.load_img(image.load_img(url, target_size=(224, 224)), target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

# Make a prediction on the image using the VGG16 model
predictions = model.predict(x)

# Decode the top 3 predictions
decoded_predictions = decode_predictions(predictions, top=3)[0]

# Print the decoded predictions
print('Predictions:')
for pred in decoded_predictions:
print('{}: {:.2%}'.format(pred[1], pred[2]))

# Display the image
plt.imshow(img)
plt.axis('off')
plt.show()

Conclusion

In conclusion, the VGGNet CNN has made significant contributions to the field of computer vision and deep learning. Its simple yet effective architecture, with stacked convolutional layers and small filters, has shown remarkable performance in image classification tasks, particularly on the ImageNet dataset. The VGGNet 16 model, in particular, has become a popular choice for transfer learning and feature extraction in downstream tasks.

Moreover, the VGGNet architecture has inspired many subsequent CNN architectures, such as the Inception and ResNet models. Its success demonstrates the importance of designing deep networks with carefully crafted architectures that balance the trade-off between model complexity and performance.

Overall, the VGGNet CNN has significantly advanced the state-of-the-art in image classification and paved the way for further developments in the field of computer vision and deep learning.

That’s it for now….I hope you liked my blog and got to know about VGGNet CNN, it’s working, and the example I had taken while implementing the code.

In the next blog, I will be discussing GoogleNet CNN and its Implementation.

If you feel my blogs are helpful, please share them with others.

Till then Stay tuned for the next blog…

***Next Blog***

--

--

Dewansh Singh

Software Engineer Intern @BirdVision | ML | Azure | Data Science | AWS | Ex-Data Science Intern @Celebal Technologies