Neural Network Tuning with TensorFlow

My struggle with learning to classify thousands of traffic sign images using deep learning. In the end, the pain taught me more than the successful results.

Param Aggarwal
Jan 25, 2017 · 6 min read

Problem Statement

We use the German Traffic Sign Classification database which is around 150MB and has around 30,000 labelled traffic signs. The goal is simple — make a piece of code that can take a new image, and determine which type of sign it is. How accurately can we do this?

The Terror

Alright, so there is no code to write, right? Oops. Now there are a bazillion parameters of the network to tune. Yes, most of them are going to tune themselves as they see more and more data. But you still need to tune the correct hyper-parameters (parameters that affect other parameters) to be just right.

LeNet-5 Architecture
  1. I tried to normalize the input range. Instead of the colours going from 0 to 255, I tried 0 to 1, -0.5 to 0.5 and 0.1 to 0.9. Though this is definitely recommended, I saw decent performance without it. I am personally against doing manual modifications to given input, more on that below.
Some classes are better represented and some are under-represented in the data.

Convolution Layer

With this, I learnt an important thing. Either one can use large convolution filters and get away with a single layer, or we split the network to go step by step and instead go layer by layer with smaller filter/kernel sizes. This was an important insight.

Fully-connected Layer

Here as mentioned above, I tried to go as small as possible. It means there are lesser parameters to tune, and hence faster learning. Remember, the back-propogation algorithm works step by step working backwards. So you would rather add more layers, than have the algo work harder on each layer, trying to guess so many parameters on the same layer.

Hyper-parameters

We have heard a lot about learning rate, and as long as you have enough data, setting really small learning rates are always the way to go. Nothing much to tune in that case. The batch-size is an interesting one though. If you set it too small, the gradient won’t be able to make meaningful guesses on where to go next. If you set it too high, the network will see a smaller number of total gradient descent steps. A number I have seen often repeat here is 128. Start with this. The third parameter is the epoch size, this usually says more about how long you want to train the network. Ideally you should write a small if condition that checks when the accuracy stops improving and stops running.

Conclusion

So these are my take-aways from this project. It was my date with neural networks and it seems to have gone OK. There is much much to learn still, and the initial opinions I have formed about these networks will help me fine tune my own understanding about them.



Computer Car

Stories of the technology behind this era of computerising vehicles.

Param Aggarwal

Written by

Software guy. When you try to tell computers what to do, you eventually learn about human nature as well. (http://paramaggarwal.com)

Computer Car

Stories of the technology behind this era of computerising vehicles.