Simple Reinforcement Learning with Tensorflow: Part 2 - Policy-based Agents
Arthur Juliani

Hi and thank you so much!! Like other’s I’ve been going through your lessons and they help tremendously.

My question is: why do we update only once per five episodes?

At first I thought it was related to the concept of delayed reward, but I don’t think it is, since even if we perform an update at end of every episode, we still have the ep_history which stores the states of the entire episode.

My only guess is it has to do with performance?