Reinforcement Learning — Part 1
- Reinforcement learning is the problem faced by an agent that must learn behaviour through trial-and-error interactions with a dynamic environment
- It is appropriately thought of as a class of problems, rather than as a set of techniques.
- There are two main strategies for solving Reinforcement Learning problems. The first is to search in the space for behaviours in order to find one that performs well in the environment. This approach has been taken by work in genetic algorithms and genetic programming.
- The second is to use statistical techniques and dynamic programming methods to estimate the utility of taking actions in state of the world.
- In the standard RL model, on each step of interaction the agent receives as input i, some indication of the current state, S, of the environment. The agent chooses an action, a, to generate an output.
- The action changes the state of the environment and the value of this state transition is communicated to the agent through a scalar reinforcement signal, r.
- It should choose actions that tend to increase the long-run sum of values of the reinforcement signal.
- It can learn to do this overtime by systematic trial and error.
An intuitive way to understand the relation between the agent and its environment is with the following example:
Environment : You are in state 65. You have 4 possible actions.
Agent: I’ll take action 2.
Environment: You received a reinforcement of 7 units. You are in state 15. You have 2 possible actions.
The agent’s job is to find a policy Π, mapping states to actions, that maximizes some long run measure of reinforcement. We assume the environment is stationary.
Important ideas in Reinforcement Learning that came up -
- Exploration — you have to try unknown actions to get information.
- Exploitation — eventually you have to use what you know.
- Regret — even if you learn intelligently you make mistakes.
- Sampling — because of chance you have to try things repeatedly.
- Difficulty — learning can be much harder than solving a known MDPs.
Alright that’s it for now! Thank you for spending your time. Cheers!