Sep 4, 2018 · 1 min read
Hey Thomas, great post ! I just wanted to confirm, in the algorithm, Gt stands for the immediate reward we receive when we take an action based on the probability distribution provided to us by the neural network, right
OR
Gt is the mean of the total cumulative reward received from that time-step towards the end of the game which would mean that at every time-step t, all actions taken at and after t would receive a gradient update ?