(Shallow?) Reinforcement Learning
Recent feats such as AlphaGo’s victory over the world’s best Go player have brought reinforcement learning (RL) to the spotlight. However, what is RL and how does it achieve such remarkable results?
In this first article, we will explore the Monte Carlo Control Method (not the deep kind) which, despite being elegantly simple, is the basis upon which some of the most advanced RL is built.
The Basics
RL problems consist of (at least) 2 entities: The agent and the environment, as shown in the figure below. The environment gives the agent a state (also called an observation). The agent then chooses an action based on the provided state and applies it to the environment. The environment then replies to the action by giving the agent a reward (a score for the action).
For example, consider a kid (the agent) playing a game (the environment) for the first time. The kid starts by seeing the game screen containing all its elements (the state) and decides on an action to take. To which the game scores him (the reward) and the process reprises until the game ends (we consider environments with a clear termination episodic). After enough repetitions, the kid will start to understand how his actions influence the environment and (assuming he is a competitive child) choose the actions that maximizes his score.