Deep Neural Networks. Traffic Sign Classifier.

Accuracy for two epochs

Give us a message if you’re interested in Blockchain and FinTech software development or just say Hi at Pharos Production Inc.

This is the second project on Udacity course Self-Driving Car Engineer. The goal is to classify images of traffic signs using convolutional neural networks (CNN). On this project, I have used LeNet-5 with some small modifications. I have changed activation layers from ReLU to tanh and for the loss calculation reduce_mean to reduce_sum. It looks like it gives a better result and it looks like there is no huge amount of dead neurons on low levels because of tanh. Let’s look through details.

Prepare

The first we should do — load all required libraries.

Libraries to import

Here we load all required for the future images processing and model training libraries:

Helpers

We should define a couple of helpers to display our data. The first is a display distribution function. It shows a distribution of data, nothing special.

Display distribution
Distribution

The second is an image gallery function. It uses OpenCV to create a nice distribution of images with titles.

Image gallery
Image distribution

Yet another one function is a simple gallery. But it can show an image with a color normalization to uint8(0–255) even if it has some weird value like -134.55 in R.

Another image gallery snippet
Image Gallery

The last one is for the final task. It visualizes top-5 probabilities for every image it gets.

For top-5 visualization snippet
Top-5 probabilities

Color Correction and Augmentation.

  • to_normalized_gray — we use this to make a 3-channel image from a grayscaled. Because our placeholders are set for the shape of data [?, 32, 32, 3]. Also, we apply OpenCV’s normalization.
  • normalize_mean — we use a mean and a standard deviation to normalize and shift all 3 channels separately with a center of distribution is zero.
Color correction

And this function we will use in data augmentation to randomize brightness level.

Brightness

Next 3 functions we will use in augmentation too, but for transformation randomization on rotation, translation and to skew the image.

Image transformation

Augmentation function is pretty simple.

Augmentation

Also, we have two functions to flip an image horizontally, vertically and in both directions. Also, we can flip and change a class if it makes sense.

Image flip.

Model

We use LeNet-5. Nothing special here. Convolution layers look the same:

Typical convolution layer

We have 32x32x3 images. Convolution with a stride 1 and the SAME padding changing the image to 28x28x16. Next, we pass it into tanh activation and max_pooling it twice to 14x14x32.

Activations and weights look like this:

Activations
Weights

The second layer is the same, on the output we have 5x5x32 tensor.

Next layer is a flatten to an 800 values vector. Then 3 fully connected layers. The first one reduces a number on 30% the second on 40% and then to 43 classes. Probe rate is set to 0.7 for 2 dropouts between FCL.

Fully connected layer

To make a network retrainable I have used two slightly different versions of convlayer and fully connected layer. Variables in the second version defined as -1.0 and without shape. Tensorflow will decide their shape and values automagically based on a saved model.

Convolution layer
Fully connected layer

So a structure of the network is:

LeNet

Training function looks like this. We use 128 items batches with Tensorboard writer of a loss and accuracy for every step.

Train function
Accuracy for two epochs
Loss for two epochs

To calculate loss and all other related stuff, we’re using one-hot, softmax and Adam optimizer.

Training

For a training, validation and test sets we have next classes of images:

Classes
Image classes

And their distribution you can find above on top of the article.

Train 1. Looking good.

Let’s train our network on a grayscaled images first. The result is pretty good from the beginning. 93.7% on validation and 91.1% on a test set. Nice!

Grayscale Training

Train 2. The best.

Now let’s continue to train with the same set but RGB. 95.6% on validation and 93.1% on a test set!

RGB set

Train 3. Overfitting.

Let’s augment our training set with flipping, transformation and brightness level. We will flip the next classes:

Flipped and transformed

As you can see — flipped images look the same as normal, but they definitely give more training data.

Totally we have 42K transformed and augmented images.

Transformed and augmented distribution
Overfitting epochs
This is bad!

Result

The network has been trained for … steps. Final accuracy and loss result. As you can see from a char below we have pretty good accuracy on the second run (Grayscale + RGB) before we have start overfitting the third run with augmented data:

  • validation accuracy 0.956
  • test accuracy 0.931

Possibly we should much decrease the learning rate, or we can try to start from the biggest image set — augmented. Yet another idea is to add an additional convolution layer. Or maybe we should change LeNet on something better!

Total accuracy
Total loss

Top-5

Now let’s test on a sample data what is our top-5 for every sample. Actually, I will show top-3.

Simple snippet
Test set

Let’s try on some images from the web. Not so great prediction, but we can live with this.

Test images
Web set

DONE

Link to Jupyter notebook repo

Thanks for reading!

--

--

Dmytro Nasyrov
Pharos Production

We build high-load software. Pharos Production founder and CTO.