Jul 22, 2017 · 1 min read
Interesting examples. Arthur I am having some problems to replicate your FrozenLake example (Q-Table part), I have the code working but it is given me weird results. The reward function is always given me zero. Because the “step” method gives always zero reward:
s1,r,d,_ = env.step(a)
that gives r=0 all the time no matter if a =0 or a=1,2,3. Could you please tell me if I am missing something ?.
Many thanks.
