Arthur Juliani
Feb 24, 2017 · 1 min read

Hi David,

The method described in Part 0 wouldn’t work well for CartPole. The problem is that the relationship between the state and action spaces is much more complex, and using the one-step updating described in Part 0 would fail to learn the dynamics of this space robustly. More advanced methods like DQN (which I describe in Part 4) would help in this case, but would still take awhile to learn due to the one-step nature.

    Arthur Juliani

    Written by

    PhD Student in Cog Neuro

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade