Simple Reinforcement Learning with Tensorflow: Part 3 - Model-Based RL
Arthur Juliani
95516

Hi Arthur,

this is great series so far. I am a bit unclear on this bit:

tGrad = sess.run(newGrads,feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})

# If gradients becom too large, end training process
if np.sum(tGrad[0] == tGrad[0]) == 0:
break

The gradients are getting smaller and smaller during the training and then they become nans. What is the cause here?