Aedan Pope
Feb 23, 2017 · 1 min read

Hi Arthur — love this series. Learning RL from scratch by following it.

In the code: When experience at the end of an episode to myBuffer, there’s a comment:

#Get all experiences from this episode and discount their rewards.

But it looks like in this code you discount future rewards at training time when calculating targetQ (rather than say, in Part 2, where you backfilled the reward series with discounted rewards before training time).

Is the comment thus misplaced?

    Aedan Pope

    Written by

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade