Simple Reinforcement Learning with Tensorflow: Part 2 - Policy-based Agents
Arthur Juliani
62938

Hi and thank you so much!! Like other’s I’ve been going through your lessons and they help tremendously.

My question is: why do we update only once per five episodes?

At first I thought it was related to the concept of delayed reward, but I don’t think it is, since even if we perform an update at end of every episode, we still have the ep_history which stores the states of the entire episode.

My only guess is it has to do with performance?

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.