Aug 25, 2017 · 1 min read
Those bounds are set because the observation space for the CartPole-V0 environment are infinite for x_dot (index 1) and theta_dot (index 3). The state_to_bucket function, which basically “bucketizes” continuous observations to discreet steps, wouldn’t work for infinite bounds.