JUNGYEON BAEK
Sep 1, 2018 · 1 min read

Thank you for detailed explanation! It really helps me to fully understand about model-free learning.

while I’m studying, I have some questions. If we don’t know exactly the next state s’ comes when state s take action a, how can we solve q-learning/sarsa algorithm? For example, at the maze problem, we can easily know how the environment is changed when action a is taken such as, direction right, left up, down at the state s. But we can think about the other example like stock estimation problem. even though we decide the best action a at the state s by greedy method, we can not be sure what the next state is going to be. because there has another random elements that are not controlled by the agent. Can you explain about it little bit more?