Reinforcement Learning: A surface-level explanation

Alice Yang
Analytics Vidhya
Published in
4 min readAug 18, 2020

Artificial Intelligence (AI) has become a huge buzz word in the past 5 years or more, and more and more people are being clued up about Artificial Neural Networks that can be trained in two different ways, namely supervised learning and unsupervised learning. However, there is one more that doesn’t really fall under either of the two mentioned categories and this is called reinforcement learning.

Reinforcement learning is generally used on already established neural network models to encourage specific behaviors to achieve more of a favored outcome. Reinforcement learning currently has been used as a buzz word, and in these cases, it just placed into a black box.

In this article, I want to give a surface level explanation of what reinforcement learning is, by opening this “black box” that has been thrown around and expected to do amazing things.

How It Works?

What Is What?

The agent is basically the decision-maker. It can either make smart decisions using some artificial neural network or maybe a simple decision-maker — that is a little more advanced than an If-Else statement. The decision-maker decides on what actions to take based on a given condition or state that it is currently in.

The environment is the “place” that the agent interacts with. It provides the agent with varying conditions or states.

The interpreter is what provides the agent with feedback based on the action the agent has chosen given a condition or state that the environment has exposed to the agent to.

Linking It To Real (Known) Models

So let’s use some models that we’re familiar with, like social media platforms: Facebook, Instagram, LinkedIn, and any other platform that encourages users to produce and share content with others — mainly with the web.

So basically these social media platforms itself is a model in which users have the options — in this case actions to chose and pick what topic, image, and content to share on these platforms. However, just sharing and posting things alone does not really encourage someone to produce the content, especially if they don’t know whether their audience (in the form of friends, family, colleagues, and maybe even strangers) enjoy the content that the user is sharing. Hence, the introduction of the “like” button, which is used to encourage users to share more.

The users use this “like” button as a form of feedback. The more likes that someone gets for sharing and/or posting specific topics or content. The more the user will find and create that type of content. However, if a user shares content that does not get as many likes compared to other more interesting content, then the user will either share less of it or not even share it at all.

This method of determining whether to post (or do) more or less of the specific content is what reinforcement learning is. Based on given actions — in this case shared content, it determines whether to do more of it or less of it based on the rewards that the agent (user) gets.

So in a nutshell, reinforcement learning encourages actions that allow the agent to achieve as many rewards as possible.

--

--