Deep Learning Based Vehicle Make-Model [MMR]Classification on CarConnection Dataset

Sridatta Marati
11 min readSep 10, 2020

--

Predicting Car Make

Contents

  1. Introduction
  2. Source of Data
  3. Existing Approaches
  4. EDA
  5. First Cut Solution
  6. Object Detection using YoloV3 in Tensorflow 2.0
  7. Building Input Pipeline
  8. Gradient Tape with Tensorboard Logs and Checkpointing
  9. Model Explanation
  10. Tensorboard Logs
  11. End to End Inference Test
  12. Final Predictions
  13. Future Work
  14. References
  15. Links to Github and Linkedin

1. Introduction

Innovative Transportation systems (ITS) are being brought to reality. It is an advanced transportation system that aim to advance and automate the operation and manage of transport systems, thereby improving upon the efficient and safety of transport. ITS combines cutting edge technology such as electronic control and communications with means and facilities of transportation. The development of digital image processing and computer vision techniques offers many advantages in enabling many important ITS applications and components such as advanced driver-assistance systems(ADAS), Automated vehicle Surveillance(AVS), traffic and activity monitoring, traffic behavior analysis, traffic management etc. Vehicle Make Model Recognition (MMR) is of great interest in these applications.

Several experiments and research has been conducted to solve different challenges in vehicle detection and identification. However, classification of vehicles into fine categories has gained attention only recently, and many challenges remain to be addressed.

The traditional vehicle MMR system relies on manual human observation or automated license plate recognition (ALPR) technique, these indirect progresses make the vehicle MMR system hardly meets the real-time constraints. Through manual observation, it is practically difficult to remember and efficiently distinguish between the wide variety of vehicle makes and models. It becomes a laborious and time-consuming task for a human observer to monitor and observe the multitude of screens and record the incoming or outgoing makes and models or to even spot the make and model being looked for.

We try to automate the task of detecting Make-Model by leveraging the Convolutional Neural Nets.

In this blog we specifically solve the CarConnection Dataset.

About CarConnection:

The Car Connection is an automotive property of Internet Brands, which owns and operates the largest network of car buying and financing resources in North America, including CarsDirect, Motor Authority, Green Car Reports, and Auto Credit Express. There are lots of car shopping websites out there, they help in providing in-depth buying advice from true auto industry experts with proven credentials. They constantly drive and evaluate new cars, trucks, minivans and crossover SUV to identify the best new vehicles, and to help you choose your next car.

2. Source of Data

The images from the website are scrapped and preprocessed to its best to avoid unnecessary images. The Dataset can be downloaded from here.The author of the dataset is Nicolas Gervais. Link to his github repository

There are in total of 297,000 pictures, of which 198,000 are unique. Many of these are interior images, which are not of much use. Finally we should be having 64,000 images. There are 3 random characters in the end, to make a unique filename ID.

All the file names are tagged separated by an underscore :

Example :Audi_A5_2013_43_18_210_20_4_73_54_182_24_FWD_4_2_Convertible_eUH.jpg

The text gives the following information:'Make', 'Model', 'Year', 'MSRP', 'Front Wheel Size (in)', 'SAE Net Horsepower @ RPM', 'Engine Type', 'Width, Max w/o mirrors (in)', 'Height, Overall (in)', Overall (in)', 'Gas Mileage', 'Drivetrain', 'Passenger Capacity', 'Passenger Doors', Style'

3. Existing approaches:

1.There are very few open source datasets that are available. Stanford cars dataset is one of them. Which has 196 classes of Make-Model. Transfer learning is one such approach to solve this problem. The researchers observed that it makes sense to affirm that “the deeper the better” when it comes to the convolutional neural networks. This makes sense, since the models should be more capabale. However, it has been noticed that after some depth, the performance degrades. They shouldn’t go as deep as wanted, because they started to lose generalization capability.

One of the problem ResNets solve is the famous Vanishing Gradient problem. When the network is too deep, the gradients from where the loss functions calculated easily shrink to zero.

2. Alternative approach is to fine-tune the ResNet 152. The major issue comes while dealing with inconsistent image samples per each class. And the shape of each and individual image. To deal with these issues, a technique is outlined that uses augmentation transforms. The images in the training set are transformed so as to increase the breadth of information the model has. It becomes better suited to recognize target objects in images which are varied contrast, size, from changed angels and so on.

4. EDA

yearly car releases
Top 10 Make’s
Bottom 10 Make’s

Make’s like Chervolet, Toyota, Ford, Bmw, Audi are being used extensively compared to the higher end brands like Tesla, Rolls Royce, and sport cars like Maserati, Ferrari, McLaren are very less used.

Top 10 Models
Bottom 10 Models

Mini cooper, Ranger, Honda Civic are used extensively, on the other hand just like in the make’s, sport vehicles like, 570GT, 718Spyder are very low in consumer usage.

Yearly Releases Top 10 Count
Yearly Releases Bottom 10

Most of the Releases happened in 2019 and 2011 and least releases happened before <2000

Type of Car
Door Count
Passenger capacity
Body Style

Though we have other parameters for each car like, type of car, door count, body style, passenger capacity etc. We consider only the Make’s as we are doing image class classification we spend more time in extracting features from images. We can use the meta data and extend the problem and perform regression task and predict the price of each car.

5. First Cut Solution

Steps:

1.

We consider only ‘Make’, as there is class imbalance in the dataset. There are 42 Classes and 64,000 images.

Explanation

We do not know for sure that all the 64,000 images are cars. There is one way to perform sanity test to ensure what we are dealing with. We can send each image to the Object Detection Model and see the total number of cars and non car images.

we will use YoloV3 as our object detection model.

EDA of CARS AND NON CAR IMAGES:

Yolo Predictions of Car and Non-Car data

Detected Cars whose accuracy is 99%

0.99 Accuracy Cars and Non-Car Data

It is evident that 51,000 images are NonCar images. These images include a single part of the car. It could be the wind shield, Side Door, Headlight, Brand logo, Interior image, Dashboard image, Steering wheel, GearBox etc.

Some of the samples found which belong to Non Car images are:

Once we pass all our images to YoloV3, we get its respective bounding boxes and crop each image to its bounding box size and then we perform classification task.

Instead of considering Make and Model, we only consider ‘Make’ for this project. So, We will be predicting the Make for 13225 Images, with 41 Classes in total.

2.

Preparing the Train and Test Dataset

We Split the data to Train-Test with 80–20 split by setting stratify = True, to ensure both the train and test datasets follow the same distribution.

Train-Test Graph

3.

Creating Input Pipeline

The tf.data API enables to build complex input pipelines from simple, reusable pieces. To create an input pipeline, we must start with a data source. For instance, to construct a Dataset from data in memory, we can use tf.data.Dataset.from_tensors( ) or tf.data.Dataset.from_tensor_slices( ).

Once we have a Dataset we can transform it into a new Dataset by chaining method calls on tf.data.Dataset object. We can apply per-element transformations such as Dataset.map( ), multi element transformations such as Dataset.batch( ).

Tensorflow provided the full documentation for building efficient pipelines for both image data and also for text data. Click the link here.

4.

Performing Classification Task

We mimic the architectures that were implemented for similar problem statement I.e Stanford car dataset. We try to fine tune Mobile Net and check the results and follow along to improve the performance.

We first crop the images to the Bounding box size and then re-scale the images to [224,224,3].

The image samples for 41 classes ranges in the range of 800–14. We set the batch size to 100 for both Train and Test datasets. We also implement Data augmentation techniques.

We train our model using gradient tape. Tensorflow provides tf.GradientTape API for automatic differentiation. We experiment with Adam and RMSProp as our optimizer and train with different learning rates. The best found optimizer and learning rate for our problem is Adam Optimizer and Learning Rate = 0.00001.

6. Object Detection using Yolov3 in Tensorflow 2.0

Yolo V3 is an improvement from its predecessors. It features multi scale detection, stronger feature extractor network, and some changes in the loss function. As a result, it detects more targets. The feature extractor is the darkent -53. The ideology was gained from ResNet which implements skip connections avoiding gradient diminishing.

The Goal of object detection is to get a bounding box and its class. The bounding box is represented in a normalized xmin, ymin, xmax, ymax format. Anchor box is a prior box that could have different pre-defined aspect ratios.

In YOLO v3, we have three anchor boxes per grid cell. And we have three scales of grids. Therefore, we will have 52x52x3, 26x26x3 and 13x13x3 anchor boxes for each scale.

For each anchor, we need to predict 3 things:

a. The location offset against the anchor box: bx, by, bw, bh. This has 4 values.
b. The objectness score to indicate if this box contains an object. This has 1 value.
c. The class probabilities to tell us which class this box belongs to. This has num_classes values.

Credits :https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/

Clean implementation of YoloV3 in tensorflow 2.0 is here. I cloned the repository and added some conditional statements to meet my requirements.

The final end result will generate the images only if it meets the following Conditional statements.

a. If the Image contains only single car model in it.

b. If the Image contains the whole body of the car and not a single part of the vehicle.

c. If the Image contains Car with 0.99% accuracy.

And once we successfully complete the loop we obtain bbox = [] , image = []

We perform cropping and saving the images to prepare train and test datasets.

To know more about YoloV3 refer the link.

7. Building Input Pipeline

custom tfdata generator

8. Gradient Tape with Tensorboard Logs and Checkpointing

utility functions for train and test step
train and test step with checkpointing and logging to tensorboard

9. Model Explanation — MobileNet V2

The drive to improve accuracy often comes at a cost. Modern state of the art networks require high computational resources beyond the capabilities of many mobile and embedded applications. The MobileNet is specifically tailored for mobile and resource constrained environments. The network pushes the state of the art for mobile tailored computer vision models, by significantly decreasing the number of operations and memory needed while retaining the accuracy.

In MobileNet V2 we have two types of blocks. One is residual block with stride 1. Another one is block with stride 2 for downsizing. Depthwise seperable, point wise convolutions is implemented in this architecture. The basic idea is to replace a full convolutional operator with a factorized version that splits convolution into two separate layers. The first layer is called a depthwise convolution, it performs lightweight filtering by applying a single convolutional filter per input channel. The second layer is a 1 × 1 convolution, called a pointwise convolution, which is responsible for building new features through computing linear combinations of the input channels.

For better understanding please refer the link to the research paper.

10. Tensorboard Logs

Accuracy
Loss

11. End To End Inference Test

inference

12. Final Predictions

13. Future Work

a. This use case can be further extended to detection of illegal vehicle, There are a lot of studies which work around detecting licence plates. It is challenging to understand which vehicle uses a fraudulent license plate when a recurrent license plate is recognized. For this reason, we can leverage vehicle Make-Model Classification method to detect the unauthorized license plates.

b. This can also be extended to automate the task of detecting vehicles that do not obey traffic rules. It is tiresome to manually note the licence plate and car model and it can be automated efficiently by performing Make-Model Detection.

c. The performance can further be improved to a great extent by scrapping relevant car images with its class Id’s. It is also suggested to have larger image sizes to get the details of the image, which might increase the performance.

d. We can achieve higher accuracy by writing ResNet from scratch and train the dataset. This will also improve the performance to a large extent.

14. References

Applied Ai Course — https://www.appliedaicourse.com/

Research paper — https://arxiv.org/pdf/1809.00953.pdf

YoloV3 — https://github.com/zzh8829/yolov3-tf2

Tensorflow — https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

MobileNet V2 — https://arxiv.org/abs/1801.04381 , https://www.youtube.com/watch?v=HD9FnjVwU8g

15. Links to Github and Linkedin

--

--