Bosch Traffic Sign Recognition: Silver Medal Approach

Aishik Rakshit

Published in

IITG.ai

7 min readApr 8, 2021

Objective:-

To create a baseline model on the German Traffic Sign Recognition Benchmark Dataset.
Add at least 5 new classes into the dataset
Interface to systematically generate and add images with different types of augmentations
Add transformations to the existing dataset to increase the dataset difficulty both for the existing classes as well as for the 5 extra classes
Connect the output of the classifier to a UI where the analyst can visualize the results and the metrics and find out what’s wrong
The UI should enable users to easily play with the metrics in such a manner that it can be used to understand and answer various questions.
Make necessary changes in the network/dataset based on the inputs from step 4 and improve the scores

Our Approach

Dataset creation method

The original dataset given (GTSRB) contains 43 classes with 40,490 images. One of our main tasks was to make this dataset more difficult. To achieve this we added 7 more classes to the existing 43 classes. The newly added classes are as follows:

Separated tracks for pedal cyclists and pedestrians only
Parking sign
Tonnage limit
Route for pedal vehicles only
No passing cars
Cross-road
No stopping

This dataset (now 50 classes) was not very difficult and our various models. Achieved near perfect results which are summarized as below: The baseline model was a simple 4 conv layer model which was just used as a tool to understand the complexity of the model required for the dataset.

Further increasing the difficulty

We used a variety of augmentations on our modified dataset to make it even more difficult. We used augmentations like center-crop to add some images with partial information. The list of augmentations is as follows:

To ensure the high level of difficulty of our dataset, we used a Hard level of augmentations. Our final dataset had 1,74,952 numbers of images.

Upload dataset

For Data Input, the user is provided with two options of either uploading new images (only zip files) or uploading data belonging to a particular class
Further, the user can also sample from existing data with a desired percentage.
Finally, the user can visualize the newly added data.

Add augmentations

We allow the user to choose from 13 different types of augmentations and apply them to the uploaded images using an easy-to-use UI and make the dataset more difficult.

The user can specify the min and max range for the augmentations and the probability with which that augmentation would be applied to the dataset. Users can visualize the augmented dataset and remove images as he/she see fit.

Training

Once the user is satisfied with the level of difficulty of the dataset he/she can then split The dataset and train the model from a list of pre-trained models available on our platform. The train-test split follows stratified sampling wherein the class ratio is maintained in both train and validation dataset.

Available pre-trained models:

Baseline
Baseline Augmented
InceptionV3
MovilenetV2
MobilenetV3

We also provide an option to upload custom model JSON file which could be trained from scratch.

Model performance and statistics

A dashboard that shows accuracy on the base dataset and an augmented much harder dataset for any model selected is always shown.

The user can choose the model from the dropdown for which the statistics needs to be displayed. In addition, testing on any new data added to our already augmented dataset will result in the following metrics being displayed in the UI and corresponding to every model pre trained or custom we have a tab to display model architecture.

Along with this, the loss, accuracy and F1 curve can be seen.

Post-experiment evaluation

We provide a UI that enables the user to analyze and understand the pitfalls in the model and the dataset. We also suggest the user optimal settings for the next experiment which can improve the results. We provide the following features to the user for post-experiment evaluation:

Confusion matrix analysis

A Confusion matrix gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

Here the user can see the confusion matrix and the top 5 mis-predicted classes with their actual classes and the number of such images.

Loss Accuracy Curve Analysis

We display the loss accuracy curve and analyze if the model is underfitting or overfitting. We then suggest possible steps that should be taken by the user to avoid underfitting or overfitting in the next experiment. The suggestions are based on changing the number of epochs, choosing a different optimizer, decreasing the dataset difficulty, altering the learning rate and depth of the network.

Explainable AI

We go a step ahead and give the user an option to use an AI that can explain the failures of the model. For this purpose, we use Grad-CAM (Gradient-weighted Class Activation Mapping). Grad-CAM uses the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting important regions in the image for predicting the concept

Methodology

We first find out the region of interest in our image using pre-trained models to create a bounding box. Then we use Grad-CAM to find out how much of the focus area lies inside the bounding box. We use IoU (Intersection over Union) to quantify this overlapping area.

Features

We display 4 Grad-CAM images for correctly classified images to see the important regions where the model is focusing. Similarly, we show 4 Grad-CAM images for the incorrectly classified images to see why the AI is failing to correctly classify the image. We allow the user to select an IoU value range using a simple slider. The bar graph dynamically displays all the images class wise which have an IoU value in the selected range.

Metrics Used to see how well the models Were Working

TSNE Plot

TSNE is non-deterministic, meaning you won’t get exactly the same output each time you run it (though the results are likely to be similar. TSNE tends to cope better with non-linear signals in your data, so odd outliers tend to have less of an effect, and often the visible separation between relevant groups is improved.

T-SNE Plot of Baseline Model on Augmented Dataset:

T-SNE Plot of Inception V3 Model on Augmented Dataset:

Silhoutte Score

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample

The better the model is at classifying the traffic signs the better is the silhouette score

DBSCAN

Density-based spatial clustering of applications with noise is a data clustering algorithm

The better the model is at classifying the traffic signs the more positive is the DBSCAN score. The worse the model is at classifying the classes the more negative the DBSCAN score is.

This concludes our solution to the BOSCH traffic Sign recognition problem of the 9th Inter IT Tech Meet at IIT Guwahati.

Team Members:

Aditya Mehndiratta
Siddharth Jain
Aishik Rakshit
Varun Yerram
Koshik Rajesh
Eklavya Jain
Amey Rambatala
Aayush Sharma
Debarshi Chanda
Anjali Soni
Tushar Bajaj

Bosch Traffic Sign Recognition: Silver Medal Approach

Objective:-

Our Approach

Written by Aishik Rakshit