Simple Reinforcement Learning with Tensorflow: Part 3 - Model-Based RL
Arthur Juliani

Hi Arthur,

this is great series so far. I am a bit unclear on this bit:

tGrad =,feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})

# If gradients becom too large, end training process
if np.sum(tGrad[0] == tGrad[0]) == 0:

The gradients are getting smaller and smaller during the training and then they become nans. What is the cause here?