Simple Reinforcement Learning with Tensorflow Part 0: Q-Learning with Tables and Neural Networks
Arthur Juliani

Hello Arthur,

Nice post.

I just want to know, what are the indices of the action (like, does 0 mean that the action to be taken is ‘up’.) In that case, I have the following understanding: 0 = ‘up’, 1 = ‘down’, 2 = ‘left’ , 3 = ‘right’. The problem is when I try to find a policy from the Q table once 2000 episodes are done, (selecting the action with max Q value), I find that in state 0, it says to move left. What does that mean, because there can’t be a left movement. There are inconsistencies at other locations also.

Please reply.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.