I just want to know, what are the indices of the action (like, does 0 mean that the action to be taken is ‘up’.) In that case, I have the following understanding: 0 = ‘up’, 1 = ‘down’, 2 = ‘left’ , 3 = ‘right’. The problem is when I try to find a policy from the Q table once 2000 episodes are done, (selecting the action with max Q value), I find that in state 0, it says to move left. What does that mean, because there can’t be a left movement. There are inconsistencies at other locations also.