Feb 23, 2017 · 1 min read
Hi Arthur — love this series. Learning RL from scratch by following it.
In the code: When experience at the end of an episode to myBuffer, there’s a comment:
#Get all experiences from this episode and discount their rewards.
But it looks like in this code you discount future rewards at training time when calculating targetQ (rather than say, in Part 2, where you backfilled the reward series with discounted rewards before training time).
Is the comment thus misplaced?
