In troubleshooting a deep network, people jump to conclusions too early too fast. Before learning how to troubleshoot it, we will spend sometimes on what to look for before spending hours in tracing dead end leads. In part 4 of the “Start a Deep Learning project”, we discuss how to visualize your Deep Learning models and performance metrics.
The 6-part series for “How to start a Deep Learning project?” consists of:
· Part 1: Start a Deep Learning project.
· Part 2: Build a Deep Learning dataset.
· Part 3: Deep Learning designs.
· Part 4: Visualize Deep Network models and metrics.
· Part 5: Debug a Deep Learning Network.
· Part 6: Improve Deep Learning Models performance & network tuning.
Never shot in the dark. Make an educated guess.
It is important to track every move and to examine results at each step. With the help of pre-built package like TensorBoard, visualize the model and metrics is easy and the rewards are almost instantaneously.
Data visualization (input, output)
Verifying the input and the output of the model. Before feeding data into a model, save some training and validation samples for visual verification. Apply steps to undo the data pre-processing. Rescale the pixel value back to [0, 255]. Check a few batches to verify we are not repeating the same batch of data. The left side images below are some training samples and the right is a validation sample.
Sometimes, it is nice to verify the input data’s histogram. Ideally, it should be zero-centered ranging from -1 to 1. If features are in different scales, the gradients will either diminish or explode (subject to the learning rate).
Save the corresponding model’s outputs regularly for verification and error analysis. For example, the color in the validation output is washing out.
Metrics (Loss & accuracy)
Besides logging the loss and the accuracy to the stdout regularly, we record and plot them to analyze its long-term trend. The diagram below is the accuracy and the cross entropy loss displayed by the TensorBoard.
Plotting the cost helps us to tune the learning rate. Any prolonged jump in cost indicates the learning rate is too high. If it is too low, we learn slowly.
Here is another real example when the learning rate is too high. We see a sudden surge in loss (likely caused by a sudden jump in the gradient).
We use the plot on accuracy to tune regularization factors. If there is a major gap between the validation and the training accuracy, the model is overfitted. To reduce overfitting, we increase regularizations.
Weight & bias: We monitor the weights and the biases closely. Here are the Layer 1’s weights and biases distributions at different training iterations. Finding large (positive or negative) weights or bias is abnormal. A Normal distributed weight is a good sign that the training is going well (but not absolutely necessary).
Activation: For gradient descent to perform the best, the nodes’ outputs before the activation functions should be Normal distributed. If not, we may apply a batch normalization to convolution layers or a layer normalization to RNN layers. We also monitor the number of dead nodes (zero activations) after the activation functions.
Gradients: For each layer, we monitor the gradients to identify one of the most serious DL problems: gradient diminishing or exploding problems. If gradients diminish quickly from the rightmost layers to the leftmost layers, we have a gradient diminishing problem.
Not very common, we visualize the CNN filters. It identifies the type of features that the model is extracting. As shown below, the first couple convolution layers are detecting edges and colors.
For CNN, we can visualize what a feature map is learning. In the following picture, it captures the top 9 pictures (on the right side) having the highest activation in a particular map. It also applies a deconvolution network to reconstruct the spatial image (left picture) from the feature map.
This image reconstruction is rarely done. But in a generative model, we often vary just one latent factor while holding others constant, It verifies whether the model is learning anything smart.
Visualize the models can be done easily with TensorBoard. TensorBoard is available to TensorFlow and other applications like PyTorch through a 3rd party extension. Spend sometime to visualize your model and you will save far more time in troubleshooting. Equipped with the runtime information of the model, we can start talking troubleshooting in Part 5: Debug a Deep Learning Network.