Deriving Policy Gradients and Implementing REINFORCE

Chris Yoon
Dec 30, 2018 · 4 min read


Some Definitions

Deriving the Policy Gradient

Implementing the REINFORCE algorithm

Pseudo code from UToronto lecture slides
Length of episode (Blue) and average length for 10 most recent episodes (orange)


Chris Yoon

Written by

Student in NYC.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade