A hitchhiker’s guide to Deep Reinforcement Learning Part 1

data_datum
3 min readAug 31, 2018

Sometimes when you start to study a topic in machine learning the material available in the web (Youtube videos, MOOCs, blogposts, Coursera, Udemy) can be overwhelming.

The idea of this blogpost is to introduce some important concepts and definitions in Deep Reinforcement Learning.

Definition

Reinforcement learning is a subfield of machine learning which addresses the problem of automatic learning of optimal decisions over time. Sometimes in machine learning problems have a hidden time dimension, which is frequently overlooked (1).

Reinforcement learning provides a mathematical framework well suited to solving games. The central mathematical concept is that of the Markov decision process, a tool for modeling AI agents that interact with environments that offer rewards upon completion of certain actions (2).

Reinforcement Learning and other machine learning paradigms.

In machine learning, some concepts like unsupervised and supervised are well-defined and discused.

Differences between Supervised, Unsupervised and Reinforcement Learning. (3)

Reinforcement learning (RL) is the third camp and lays somewhere in between supervised and unsupervised learning. RL uses many well-established methods of supervised learning such as deep neural networks for function approximation, stochastic gradient descent, to learn data representation, but it applies them in a different way (2).

Some examples of RL applications are: 1) fly stunts manoeuvres in a helicopter, 2) defeat the world champion at Backgammon, 3) manage an investment portfolio, 4) control a power station, 5) make a humanoid robot walk, and 6) play many different Atari games better than humans (4).

RL formalisms and relations

RL entities and their communications

Reward. In RL, it’s just a scalar value we obtain periodically from the environment. The purpose of the reward is to tell our agent how well they have behaved. Reward is local, it reflects the success of the agent’s recent activity, not all successes achieved by the agent so far.

Agent is somebody or something who/which interacts with the environment by executing certain actions, taking observations, and receiving eventual rewards for this.

Environment is everything outside of an agent.

Actions are things that an agent can do in the environment. Actions can be moves allowed by the rules of play. This can be discrete or continuous. Discrete are the finite set of mutually exclusive things an agent could do, such as move left or right.

Observations are pieces of information that the environment provides the agent with, which say what’s going on around them (1).

This is the first part of succesive posts about Deep Reinforcement Learning.

References

  1. Deep Reinforcement Learning Hands-On. http://bit.ly/2wosxGD
  2. Tensorflow for Deep Learning. From linear regression to reinforcement learning. https://oreil.ly/2NyFx2R
  3. Reinforcement Learning Georgia Tech, Udacity http://bit.ly/2wv9OsR

4. Reinforcement Learning Videos. DeepMind Course available at Youtube http://bit.ly/2wu0fuq and the slides http://bit.ly/2wuAYAl

--

--