(Shallow?) Reinforcement Learning

Paulo Carvalho
The Startup
Published in
6 min readApr 15, 2020

--

Recent feats such as AlphaGo’s victory over the world’s best Go player have brought reinforcement learning (RL) to the spotlight. However, what is RL and how does it achieve such remarkable results?

In this first article, we will explore the Monte Carlo Control Method (not the deep kind) which, despite being elegantly simple, is the basis upon which some of the most advanced RL is built.

The Basics

RL problems consist of (at least) 2 entities: The agent and the environment, as shown in the figure below. The environment gives the agent a state (also called an observation). The agent then chooses an action based on the provided state and applies it to the environment. The environment then replies to the action by giving the agent a reward (a score for the action).

Photo by Kelly Sikkema on Unsplash

For example, consider a kid (the agent) playing a game (the environment) for the first time. The kid starts by seeing the game screen containing all its elements (the state) and decides on an action to take. To which the game scores him (the reward) and the process reprises until the game ends (we consider environments with a clear termination episodic). After enough repetitions, the kid will start to understand how his actions influence the environment and (assuming he is a competitive child) choose the actions that maximizes his score.

--

--

Paulo Carvalho
The Startup

Want to chat about startups, consulting or engineering? Just send me an email on paulo@avantsoft.com.br.