Using Tensorboard

Tensorflow is one of the most popular machine learning frameworks nowadays. It uses graph concepts to describe the data flow and the model's operations. Each node represents a math operation and each connection or graph edge represents a multidimensional array, known as a tensor.

Its interface is relatively simple and intuitive. However, it can be hard to debug the network with big and complex models. And it would be very nice to debug and understand better your program if we could visualize the parameters and the graph structure.

Tensorboard is a Tensorflow tool that allows us to exactly do it.


Tensorboard

Tensorboard is a tool that allows the visualization of any statistics of a neural network such as the training parameters (loss, accuracy and weights), images and even the graph. And this can be very useful to understand the tensors flow in the graph and thus debug and optimize the model.

In this article, we are going to talk about how to use Tensorboard with some practical examples.

Basic Concepts

Tensorboard works by reading the event files which is where Tensorflow writes the summary data (the data to be visualized). It usually comes installed with Tensorflow and to execute it simply run the following command: tensorboard --logdir=[dir] where [dir] is the event files location. In order to view the data, simply access the link displayed on terminal.

To write an event file it is necessary to create an FileWriter instance, and for this we can just call its constructor tf.summary.FileWriter([dir], [graph]), where [dir] is the event files directory and [graph] is the graph.

To generate the data that will be analyzed we can use the function tf.summary.scalar(name, data) where scalar can be histogram, image, audio and text, depending on the data type to be visualized.

Finally we use writer.add_summary(summary, step) to write the data to the event file, where writer is an instance of FileWriter.

Visualizing the graph

The snippet below shows how to create a graph for visualization on Tensorboard:

The command mentioned to create the FileWriter also creates the graph for visualization. However the generated graph can be difficult to understand, specially if the model is big and complex.

A useful technique is using intuitive variable names and workink with scopes. With this, Tensorboard creates a hierarchy with nodes, encapsulating the nodes of a scope into a single node.

The variable names are defined by an optional parameter name present in most of tensorflow functions, and the scopes are defined by the function tf.name_scope(name).

Two graphs representing the same network. On the left we have scopes and it is clearer for visualizing.

Others graph characteristics that helps on visualization are that nodes with the same structure have the same color and also it is possible to navigate into the nodes to see inside it.

Visualizing scalars

In the following code, we are creating a summary for a multi-task classifier loss and also the total loss:

Compute the loss and then create the summary

tf.summary.merge_all() is a useful function so that you do not need to write every single summary event file, it merges all the defined summary into a single buffer, and then you call the function writer.add_summary() only once.

As I said, you just need to use tf.summary.scalar(data, name) to generate the data. Analyzing variables like loss, accuracy, gradient of some layer or even the learning rate can be very useful to verify if the network is going to converge or not. The possibility to see these variables "on live" helps to identify some problems early.

The last layer gradient and the total loss for three learning rates (0.0005, 0.5 e 1.0)

In the figure above, we can see that for learning rate equals to 1, the gradient achieves 0 very fast with a high loss, this means that we can discard this option because the network is not going to learn anymore and that we should decrease the lr. For learning rate equals to 0.0005, we see that the loss is slowly going down and the gradient is increasing, this could mean that we are on the right way but we should increase the lr so that it go faster. The learning rate equals to 0.05 shows a good result.

Visualizing histograms

First we create the weights and then its histograms

The histograms use the function tf.summary.histogram(data, name). A histogram is basically a collection of values represented by the frequency/density that the value has in the collection. On Tensorboard, they are used to visualize the weights over time. It is important because it could give a hint that the weights initialization or the learning rate are wrong.

The last layer histogram for different learning rates

There are three dimensions in a histogram, the depth (y-dimension) is the epochs, the deeper (and darker) the older are the values. The z-dimension is the density of values represented at x-dimension.


The other types (image, audio and text) are not mentioned here because they are only for specific networks. But they can give us a good intuition of what the network is learning. For example, in an image recognition network, it is possible to see the patterns each layer is learning.

An example of the images after the first convolution (the fourth image is the original one)

The code

The code and the Tensorboard images are from a multi-task classifier that recognizes a character sequence with maximum length 5. The dataset was synthetically generated from the not MNIST dataset. The network consists basically of two convolutional layers, two fully-connected and five classifiers (it has an additional 'blank' class in case the sequence does not have length 5).

The results on the test dataset was:

Total loss: 1.1103163957595825
Global accuracy: 71.62%
Char 0: 90.34%
Char 1: 91.02%
Char 2: 93.16%
Char 3: 94.73%
Char 4: 97.71%
Average: 93.39%

The global accuracy corresponds to the correct sequences, that is, the sequences that all the chars was corrected classified.

You can access the code in the following Github repository:


Conclusion

Tensorboard is a great tool so that you can understand and debug your network. It is possible to see how the loss, accuracy, learning rate and the models weights behave. Furthermore, it is a good way to cross-validate the hyperparameters.

A drawback is that your code can get very verbose, that is, as a consequence of creating many scopes and naming variables (so that the visualization can be legible) and adding a line with tf.summary for every variable, the code can be difficult to read.

However I believe that with a good code organization, this will not be a problem :)

References:

- Tensoflow: https://www.tensorflow.org/guide/summaries_and_tensorboard
- Ganegedara, Thushan. TensorBoard Tutorial: https://www.datacamp.com/community/tutorials/tensorboard-tutorial
- Mobiny, Aryan. How to Use TensorBoard?: https://itnext.io/how-to-use-tensorboard-5d82f8654496