In the beginning of the contest we didn’t know about TensorBoard, we didn’t use it unfortunately (now we would use it). Instead we added our own plots saving. We looked mostly on EpLenMean and EpRewMean. In the console logs we also looked at timings and entropy. Actions mean and std — we added logging of that, described in the post. And similiar to you, from time to time we visualized the model. We took snapshots of model and plots every 25 iterations. That’s good in case the machine or process goes down for some reason.
For debugging, printing works to some extent. For TF there is tf.Print. At one moment I printed all the network weights, unfortunately it was unreadable. A small tool to visualize that would be nice. I wanted to see e.g. to which inputs it reacts and which it ignores.
And first I would run PPO on some easier, similar environment, e.g. here we used MuJoCo’s Walker2d. I’d see if I can replicate the results from paper, see how the plots should look like. And after gaining confidence that everything works, try with more complex env.