Deep Reinforcement Learning Demysitifed (Episode 2) — Policy Iteration, Value Iteration and Q…
Moustafa Alzantot

Nice article but getting error in both.

‘TimeLimit’ object has no attribute ‘P’ at env.P[s][a]]

TimeLimit’ object has no attribute ‘nS’

I guess you cannot access observation and action space like this now.

Any solution to first error ??

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.