German Traffic Sign Classifier

Project Goal and Data Set Explanation

  • Load the German traffic data set
  • Explore, summarize and visualize the data set
  • Design, train and test a model architecture
  • Use the model to make predictions on new images
  • Analyze the softmax probabilities of the new images

Source Code:

1. Data Set Summary & Exploration

Here is a visual summary of the number of training images with respect to each label:

There are in total of 34799 training images, 4410 validation images and 12630 testing images. Each image has a dimension of 32 x 32 x 3. In total, there are 43 labels that uniquely determine 43 German Traffic Signs.

2. Data Distribution Visualization

Here is a visualization of each unique label and corresponding traffic sign:

Note the x-axis is the Class-ID of each sign and y-axis shows the number of data with respect to each Class-ID. The Class-ID reference can be found here.

Design and Test a Model Architecture

Image Processing

I have tried to train the neural network with different image pre-processing techniques and I notice the normalization plays a huge important role. Since I have a powerful GPU on my local machine (GTX980Ti), I decide to use image input with three-color channels (RGB) directly without applying gray-scale. First, I tried to apply normalization with (pixel - 128)/ 128 and the validation accuracy can barely reach 85%. Next, I applied the standard statistical normalization technique for the training data using pixel - pixel.mean()/pixel.std() and the result improved the accuracy dramatically.

To increase the randomness of data argumentation, I decide to perform an on-the-go data argumentation approach, .i.e., randomly making modifications on each image in one Epoch training. I believe the brightness might play a big role during image classification. Therefore, I randomly apply the brightness adjustment during each Epoch training using brightness_adjustment() function on the normalized image. In addition, I have also tried Gaussian blur to further increase the possibility of image argumentation but the result is not phenomenal. It is then removed to increase data processing speed.

Here is an example of an original image and the normalized image:

Here is an example of normalized image and brightness adjusted image (brighter in this case):

Neural Network Models

I have experimented with two different Convolutional Neural Network (CNN) models. Here is the the one I end up using (namely nn_model()):

This particular CNN structure is inspired from VGG16. The idea is to multiple convolutional layers with decreasing kernel size as the layer gets deeper. However, the number of CNN filters doubles consecutively (32, 64, 128 …). The itution is that the first few CNN filter learns the general features, such as edges etc. As the layer gets deeper, the smaller filters are used to learn the more detailed features. To reduce the possibility of overfitting, I added two dropout layers that remove 50% of the input from previous layers.

Just for reference, the second CNN modelnn_model_2() is simlar but with much less layers:

To train the model, I initialize the weights of the CNN with a mean of 0 and variance of 0.01. I break the entire training data set into 256 images for each batch and perform training with 20 epochs.

The loss function I used here is multi-class cross-entropy function:

$$-\sum_{c=1}^m y_c log(p_c)$$ ,where $y$ is the label and $p_c$ is the probability of the corresponding label. Next, I use Adam optimization algorithm with learning rate of 0.0009 to minimize the loss. The training process is relatively fast with a dedicated GPU.

CNN Design Process

The design process is performed iteratively. Starting with Lenet from Yann LeCun, the validation accuracy can only reach roughly 85% even with image argumentation. From my experience, we need more complex neural network structure i.e., more CNN filters and deeper Neural Net layers, to train the model better. But there is no free lunch, as the more complicated Neural Net requires more data to train. I decided to not using complex networks such as Resnet-50 for this particular project. Therefore, the VGG-16 draws my attention during the design process. The idea is to design a CNN model that can capture enough features, while minimizing the number of layers and parameters. In the end, we do not want an over-complicated model that is difficult to train. After a few trial and errors, I ended up with a CNN that has 10 layers.

My final model has a validation accuracy around 96%.

Test theModel on New Images

To test out if my CNN can correctly classify new images that it has never seen before. Here are eight German traffic signs that I found on the web:

Before feeding testing images into the pipeline, I perform a quick resizing so it matches the input requirement of the CNN (In this case, 32x32x3). Next, we must also normalize the input using the previously introduced technique. The code is shown as the following:

My model has successfully identified all testing images from the web. Here are the results of the prediction:

The code for making predictions on my final model is the following:

It reads the previously saved model and then use evaluate() function to check the accuracy of each prediction.

For the testing image No Entry, the model believes it has a probability 1.0 to be No Entry.

The top-5 softmax probability is the following:

Note, mathematically the sum of all probabilities has to equal to 1.0. If the first entry is 1.0, that automatically implies all the other classes has a probability of 0. Therefore, any extremely small number is simply a numerical error and can be treated as 0. Here are the probability for the other 7 images:

The top-5 softmax probability is the following:

The top-5 softmax probability is the following:

The top-5 softmax probability is the following:

The top-5 softmax probability is the following:

The top-5 softmax probability is the following:

The top-5 softmax probability is the following:

The top-5 softmax probability is the following: