Hi Arthur,
Arun Kumar
11

Hi Arun,

You are correct to observe that using a simple Q-learning algorithm on CartPole will fail. Due to the nature of the state space in CartPole it is very difficult for a basic Q algorithm to solve it. In fact, the Q-learning algorithm described here is almost never used for large or continuous state/action spaces. Instead DQN, with it’s augmentations to improve robustness is used. Or a policy gradient method as you mentioned.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.