Car Brand Classification using VGG16-Transfer Learning

Sameer Kumar
Analytics Vidhya
Published in
6 min readJan 25, 2022

Introduction

In this article, we will go through a mini project of Car Brand classification using VGG16 Transfer Learning model. We will try to create a model which can identify the class of car i.e. Audi, Mercedes and Lamborghini based on it’s images.

First, I will give you a short overview of VGG16 architecture and then we will understand each line of pre-trained model of VGG16 using Keras in depth.

VGG16

VGG16 is a 16 layer transfer learning architecture and is quite similar to earlier architectures as it’s foundation is based on CNN only but the arrangement is a bit different.

The standard input image size which was taken by the researchers for this architecture was 224*224*3 where 3 represents the RGB channel. These images pass through two major layers i.e. Convolution layer and Pooling.

In convolution layer, the filter size is of 3*3, stride value is 1 and padding is same. Padding=’same’ means that for the convolution layer, the size of of input image should not change.

In the Pooling layer, the filter size is of 2*2 and the stride value is 2. Every time we pass the image to Pooling layer, the size of image will get reduced to half and number of kernels/filters will keep on increasing.

Advantages of VGG Net over Alex net

  1. VGG Net uses a deep neural network (16 layers) compared Alex net (7layers), so VGG can extract more information from the network as it deals multiple parameters. Since we are using RELU activation function, there won’t be a problem of Vanishing Gradient problem.
  2. In Alex net, after experimenting a lot, they selected a filter size and there was no order of layers and kernels but in VGG we have a general architecture where convolution layers and pooling occur in a systematic way.

VGG as a pre-trained model (transfer learning)

VGG16, VGG19, Resnet50, Inception-Net etc are state of art algorithms or pre-trained models and we can use these algorithm’s weights to create our own model.

This re-usage of weights of a pre-trained model for our own model is called Transfer Learning.

All these state of art algorithms will give you the output of 1000 categories since they were trained on image net dataset but in our case we require only 3 categories. So, we cut the 1st layer and the last layer and replace the layer with 3 nodes (classes) layer. We will see this in code as well .

The above image showcases the power of ImageNet which is a database which contains around 14 million images used by Computer Vision researchers.

Steps of using VGG16 for Car Brand classification

Let us now go through the implementation of VGG16 using Keras. We will understand every line of code one by one.

A] Import the libraries

B] Define the size of image we want as per the standard of VGG16

The height and breadth of the input image is 224*224 and 3 for RGB channel which we will add later on.

C] Provide the training and testing dataset directory

The training folder contains 64 images of 3 cars and the testing folder contains 58 images of 3 cars. We can always increase the number of images in our folder for better performance.

D] Load the pre-trained model and define the image size and parameters

We have loaded the model and used 3 important parameters. input_shape=image size+[3]. That [3] represents RGB channel.

weights=’imagenet’ is important as it means that we are using the pre-trained weights of the model that the model was originally trained on using imagenet images.

include_top= False means that we will cut off the last dense layer of pre-trained model of VGG16 as it was trained for 1000 output categories of imagenet whereas we only have 3 categories. We also cut off the 1st layer because the input image and the size can be of my choice.

E] Do not train the existing weights

vgg.layers shows the 16 layer objects that have been formed.

layer.trainable=False means we are saying that do not train the existing weights for these layers .We want to use them as it is. We have a control over the weights since it is a pre-trained model. We can control in which layer we should we pre-trained weights and in which layer we should use trainable weights.

F] Use glob to get number of classes in your dataset

I mentioned earlier that we cut off the last layer of pre-trained model because we wanted to replace 1000 categories dense layers with just 3 categories. We use glob to get those 3 categories from the folder.

vgg.output shows the image matrix in the last layer after pooling. Now we flatten that layer and then feed it to the network .

G] Flattening

H] Now put this flattened array to the dense network

We use the Softmax activation function as there are 3 nodes.

I] Define the model with input and outputs as the parameters

J] Cost and optimizer used

K] Now we use Image Data Generator to import images from the dataset and perform Data Augmentation on training data.

Received validation accuracy of 88%

L] Demo

Conclusion

This was a short overview of VGG16 architecture through a mini Deep Learning project of car brand classification. Hope you guys liked the explanation :)

Do follow me on Medium for other exciting articles related to Machine learning and Deep learning!!

--

--

Sameer Kumar
Analytics Vidhya

AI Research intern at SCAAI || Kaggle 2x Expert || Machine Learning || Deep Learning || NLP || Python