Shivan Nawal
Sep 4, 2018 · 1 min read

Hey Thomas, great post ! I just wanted to confirm, in the algorithm, Gt stands for the immediate reward we receive when we take an action based on the probability distribution provided to us by the neural network, right

OR

Gt is the mean of the total cumulative reward received from that time-step towards the end of the game which would mean that at every time-step t, all actions taken at and after t would receive a gradient update ?