Possible pitfalls while training a deep neural network.
The network had been training for almost a day. Loss was decreasing. Gradients flow well. The output is still all zeros, all black, nothing detected from input. I have myself been through these situations and this is my check list to find where I am possibly going wrong.
First things first:
- Pick a basic well known model that suits your data. (for example, VGG for images.)
- Turn off all add on features like regularization, batch normalization, augmentation etc.
- While fine tuning a model preprocessing steps should be similar.
Issues to be checked:
- Check for correct mapping of input data to output manually. Sometimes we may feed the input as all zeros or we may repeatedly send the same batch for training or we may tend to miss shuffling in the input data. So manually checking the input and output by printing them in console may help. There was a time where my data loader was faulty :P.
- Checking sense for what relationship we seek between input and output. We should be in a position to identify an universal non trivial relationship between the input and output and only then expect performance from network. I personally don’t think that the network would be accurate for unrelated inputs and output.
- Class balance and enough training samples. Training a network from scratch will require you lots of data and a decent balance for loss if there is an imbalance in the training samples per class.
- Augmentation and regularization related. Too much augmentation with L2 and dropout regularization may cause the network to underfit.
- Preprocessing steps need to be independent for training, test and vaildation. I think this is a common mistake for everyone including me. Preprocessing on entire data like zero mean and unit variance will fail. Training data should be segregated, preprocessed and only then go to the vaildation or test data preprocessing.
- Check for your loss function. If you implemented your own loss function then have a check on the gradient equations and implementation. Often, having unit tests would help. Combination of loss function should be taken care with proper weightage.
- Check weight initialization. I do not know the proper math behind this but weight initialization seriously matters. Best practise if unsure is to have xavier initialization.
- Check custom layers. Normalization and gradient flow through these layers should be taken care of. Double check them else you will be wasting weeks of time like I did. Even the view layer might throw you in a pit if not properly coded :P .
There are may other challenges that need to be dealt like vanishing/exploding gradients , optimisers, hyper parameters or it can simply be more time. Whatever it is have fun making a network what you want it to do.