Traffic Signs Classification with a Convolutional Neural Network
Like many others, I rely on Google Maps quite a bit these days to take me from point A to point B. While it helps me reach my destination with dynamic traffic updates and alternate route suggestions, I couldn’t help notice how much we rely on the traffic sign infrastructure as well. Imagine a few decades ahead when all vehicles are level 5 autonomous and fully connected so they can communicate between themselves, and with the transportation infrastructure. In such a scenario, there may not be a need for many of these traffic signs. Until then however, we still have to rely upon them as long as autonomous vehicles with different levels of autonomy co-exist harmoniously with human drivers on the road. To accomplish such a task, it would mean that autonomous vehicles should have the ability to read and understand the traffic signs, and to determine the appropriate action. This project focuses on the former: To develop a neural network that reads traffic signs and classifies them correctly.
These are the steps I followed to accomplish the task:
- Explore and Visualize the dataset
- Pre-process and augment the dataset, if needed
- Develop a CNN model
- Train and validate the model
- Optimize the model by experimenting with different hyper-parameters
- Test the model with the test dataset
Let’s delve deeper into each of the steps above to understand the process better.
Explore & Visualize the dataset:
The dataset used for this project is German Traffic Signs. There are 43 unique traffic signs in the dataset and a quick look at the histogram of the number of samples for each traffic sign shows that they are unevenly distributed. Each image in the dataset is 32 pixels x 32 pixels x 3 Channels (one each for RGB).
Number of training examples = 34799
Number of validation examples = 4410
Number of testing examples = 12630
Image shape = (32, 32, 3)
Here are some random images from the input dataset:
Pre-process and augment the dataset, if needed:
All the images in the dataset were normalized so that the data has a mean of zero and equal variance. This helps the model converge faster. I ran the model without any image augmentation and found out that the validation accuracy for the model was quite high when compared to the training accuracy, as seen in the 1st column of Fig. 5 at the end of this post. Since there was an uneven distribution of the number of images for each class as shown in the histogram of the training set images in the previous step, I augmented images for some classes such that the minimum number of images in each class is 250. A summary of the number of images that were added to those classes which had less than 250 images is shown below:
Adding 70 samples for class 0
Adding 70 samples for class 19
Adding 10 samples for class 24
Adding 40 samples for class 27
Adding 10 samples for class 29
Adding 40 samples for class 32
Adding 70 samples for class 37
Adding 40 samples for class 41
Adding 40 samples for class 42
For the augmented images, I applied a transform function which randomly rotates, translates, and shears the input image to avoid replicates of the input dataset. Credit goes to Vivek for this method. Adding more than 250 samples for each traffic sign did not produce any improvement in the results. The total number of training images before & after augmentation were 34799 & 35189 respectively. Here are some random images from the dataset after pre-processing and augmentation:
Develop a CNN model:
I chose the LeNet architecture (Fig. 4 below) and adapted it for this project to start with, since it is already a well established and proven model. The image below shows the number of layers, layer sizes, & connectivity of the model. The goal for this project is to ensure at least 93% in model accuracy with the validation dataset.
Train and validate the model:
I started with the LeNet architecture and trained the model with the training dataset. To start with, I did not augment any additional images to the training set to establish a baseline. When I ran the model with the validation dataset, the results were not good: see the 1st column of Fig. 5 below. The training accuracy was quite high, but the validation accuracy was poor. A high accuracy on the training set but low accuracy on the validation set implies overfitting. To address this overfitting problem, there are a number of recommended approaches in the toolkit that I could experiment with:
- Weight Initialization.
- Learning Rate.
- Activation Functions.
- Network Topology.
- Batches and Epochs.
- Optimization and Loss.
- Early Stopping.
Hmmm. It’s great that there are all these different approaches to make my model perform better, but where do I even start? A head-scratching emoji would have been perfect here, but I couldn’t find a straight-forward way to insert one. I digress. Moving on….
Optimize the model by experimenting with different hyper-parameters:
This project is a part of Udacity’s Self-Driving Car program and these words of wisdom from Brian Catanzaro, Nvidia’s VP of applied deep learning research, stuck with me: “Deep learning is a hands-on empirical field. The practice is definitely ahead of theory in many of the cases. Keep experimenting until you find a network and a set of hyper-parameters that work for your problem.” Armed with this knowledge, I began to experiment: I augmented images to classes which had fewer than 250 images to ensure that there was a minimum of 250 images in each class. More than 250 images did not help improve the model performance. For the next set of experiments, I started with the following hyper-parameters: a batch size of 128, a learning rate of 0.001, number of epochs: 25, and experimented with different dropout’s on the fully connected layers of the LeNet model. The results are shown in Fig. 5 (columns 2–5). The results with a dropout of 0.7 look quite promising and meet the project goals: A validation accuracy of 96.6%. We’re not quite done yet, since we need to check how this model performs on the test dataset. The accuracy of the model with the test dataset is ~94%. Great! I have now met all of the goals for this project. There is definitely more room for improvement and a number of different hyper-parameters that I can still iterate with, but I’ll leave that for a future project. One thing to note is to run these iterations on an AWS EC2 GPU instance or other similar instances for much faster results.
The code for this project can be found on my GitHub page. The Udacity forums and the slack channels were great resources.