Deep Neural Networks. Traffic Sign Classifier.
Machine Learning & Data Science A-Z Guide.
Give us a message if you’re interested in Blockchain and FinTech software development or just say Hi at Pharos Production Inc.
This is the second project on Udacity course Self-Driving Car Engineer. The goal is to classify images of traffic signs using convolutional neural networks (CNN). On this project, I have used LeNet-5 with some small modifications. I have changed activation layers from ReLU to tanh and for the loss calculation reduce_mean to reduce_sum. It looks like it gives a better result and it looks like there is no huge amount of dead neurons on low levels because of tanh. Let’s look through details.
Prepare
The first we should do — load all required libraries.
Here we load all required for the future images processing and model training libraries:
- random, math — mathematical libraries from a standard python suit to handle all required math
- numpy — great scientific library to process data arrays
- pickle — library to serialize and deserialize data http://www.numpy.org
- matplotlib.pyplot — plotting library https://matplotlib.org
- pandas — another great library to process data https://pandas.pydata.org
- cv2 — image processing library http://opencv.org
- tensorflow — machine learning framework https://www.tensorflow.org
- sklearn — scientific library for data processing and many other things http://scikit-learn.org
Helpers
We should define a couple of helpers to display our data. The first is a display distribution function. It shows a distribution of data, nothing special.
The second is an image gallery function. It uses OpenCV to create a nice distribution of images with titles.
Yet another one function is a simple gallery. But it can show an image with a color normalization to uint8(0–255) even if it has some weird value like -134.55 in R.
The last one is for the final task. It visualizes top-5 probabilities for every image it gets.
Color Correction and Augmentation.
- to_normalized_gray — we use this to make a 3-channel image from a grayscaled. Because our placeholders are set for the shape of data [?, 32, 32, 3]. Also, we apply OpenCV’s normalization.
- normalize_mean — we use a mean and a standard deviation to normalize and shift all 3 channels separately with a center of distribution is zero.
And this function we will use in data augmentation to randomize brightness level.
Next 3 functions we will use in augmentation too, but for transformation randomization on rotation, translation and to skew the image.
Augmentation function is pretty simple.
Also, we have two functions to flip an image horizontally, vertically and in both directions. Also, we can flip and change a class if it makes sense.
Model
We use LeNet-5. Nothing special here. Convolution layers look the same:
We have 32x32x3 images. Convolution with a stride 1 and the SAME padding changing the image to 28x28x16. Next, we pass it into tanh activation and max_pooling it twice to 14x14x32.
Activations and weights look like this:
The second layer is the same, on the output we have 5x5x32 tensor.
Next layer is a flatten to an 800 values vector. Then 3 fully connected layers. The first one reduces a number on 30% the second on 40% and then to 43 classes. Probe rate is set to 0.7 for 2 dropouts between FCL.
To make a network retrainable I have used two slightly different versions of convlayer and fully connected layer. Variables in the second version defined as -1.0 and without shape. Tensorflow will decide their shape and values automagically based on a saved model.
So a structure of the network is:
Training function looks like this. We use 128 items batches with Tensorboard writer of a loss and accuracy for every step.
To calculate loss and all other related stuff, we’re using one-hot, softmax and Adam optimizer.
Training
For a training, validation and test sets we have next classes of images:
And their distribution you can find above on top of the article.
Train 1. Looking good.
Let’s train our network on a grayscaled images first. The result is pretty good from the beginning. 93.7% on validation and 91.1% on a test set. Nice!
Train 2. The best.
Now let’s continue to train with the same set but RGB. 95.6% on validation and 93.1% on a test set!
Train 3. Overfitting.
Let’s augment our training set with flipping, transformation and brightness level. We will flip the next classes:
As you can see — flipped images look the same as normal, but they definitely give more training data.
Totally we have 42K transformed and augmented images.
Result
The network has been trained for … steps. Final accuracy and loss result. As you can see from a char below we have pretty good accuracy on the second run (Grayscale + RGB) before we have start overfitting the third run with augmented data:
- validation accuracy 0.956
- test accuracy 0.931
Possibly we should much decrease the learning rate, or we can try to start from the biggest image set — augmented. Yet another idea is to add an additional convolution layer. Or maybe we should change LeNet on something better!
Top-5
Now let’s test on a sample data what is our top-5 for every sample. Actually, I will show top-3.
Let’s try on some images from the web. Not so great prediction, but we can live with this.