Reinforcement Learning Explained in Simple Words for Beginners

Lakshmi Prakash
Design and Development
5 min readDec 2, 2022

What is reinforcement learning? Once you step into the world of artificial intelligence and machine learning, you would pretty soon hear about this popular term: “reinforcement learning”. What does it mean?

The more frequently you do something, the better you get at it. Yes, here, we are not talking about machines but humans. Take for example just any skill, say typing on your keyboard, or learning a new language, or making a pizza. Think about it; you’d agree it’s true because that’s how we humans (and for that matter, even animals) learn. And what exactly makes us better at something the more we do it? How does “practice makes a man perfect” work? It works because we learn the dos and don’ts of a process (typing or communicating or cooking) by learning from our mistakes. If you overcook your food, you’re going to burn it, making it not edible or tasty, and the same happens when you half-bake something: unhealthy or not tasty.

Learning to become better at something by learning from mistakes is the principle behind the machine learning technique called “reinforcement learning”.

How is Reinforcement Learning Different from Supervised Learning and Unsupervised Learning?

We already know what supervised learning and unsupervised learning are. Does reinforcement learning fall under “supervised learning” or “unsupervised learning”? Neither. In supervised learning, as the name implies, we adequately teach a machine to understand patterns from a labelled training dataset, so that when faced questions outside the training data, the machine can use its knowledge to perform the right actions. This kind of learning depends on the information we feed. With reinforcement learning, though, the goal is to make the machine learn on its own.

What is Reinforcement Learning in Artificial Intelligence?

Wait, if reinforcement learning helps a machine learn on its own, without a supervisor or subject matter expert to teach the machine, then doesn’t that mean that reinforcement learning is another form of unsupervised learning? No, not really. To understand how reinforcement learning is different from both supervised machine learning and unsupervised machine learning, we must understand how reinforcement learning works.

How Reinforcement Learning Works?

Well, we know that reinforcement learning is about learning from practice or repeated attempts in layman’s terms, but how does this happen? Well, once again, think of how we, as human beings, learn through practice? Imagine a child trying to get a parent’s attention or a junior trying to impress a senior at work. How would these people function? They know what their goal is, and in the beginning, they know little about which actions would work in their favor and which actions would be fruitless and which actions would backfire. They gather this information through observation and practice. They employ different kinds of strategies to figure out which works the best.

Basically, it’s a matter of trial and error. Keep trying repeatedly, and in the process, you’d understand which actions can be rewarding and which actions can be punishing. This is how reinforcement learning works: it is based on a concept of rewards.

This way, reinforcement learning is analogous to what is called “operant conditioning” in psychology, which is a theory of behavior modification based on “rewards” and “punishments” in learning, a theory that has undoubtedly been proven to be highly reliable.

Coming back to the difference between unsupervised learning and reinforcement learning, while neither type of learning is based on labelled data, unsupervised learning is about a machine figuring out patterns on its own. But this does not include the machine learning anything based on rewards or punishments.

In reinforcement learning, the machine develops a what is called a “policy” — criteria for how to act and how not to act based on a “reward signal”. The goal of the machine is to, in each step, focus on how to maximize the reward signal, in order to learn the policy. In each step, the machine chooses a move from all the available moves and makes the move, thereby learning at each step.

Since these moves happen in steps and choices could be anything from a random probability distribution, this is a discrete-time, stochastic, decision-making process. That is, reinforcement learning is, mathematically speaking, a Markov’s decision process.

In the short-term, the goal is to seek high rewards and avoid or prevent low value rewards, and the long-term goal is to develop the policy it should follow to get the best returns, to learn more along the way. Of course, this is a very simple explanation for what reinforcement learning is and what sub-processes are involved in this; there is a lot more to it.

Practical Applications of Reinforcement Learning:

Reinforcement machine learning is as popular as it is because it can be applied in a multitude of real-world problems. It has been proven to be highly effective, so let us see a few examples.

Google’s AlphaGo is one of the most interesting examples of reinforcement learning. This self-taught AI defeated expert after expert, and is considered a significant milestone in the evolution of artificial intelligence.

Considered as a tough competitor for DeepMind, OpenAI is another brilliant artificial intelligence that uses reinforcement learning.

“soon you will be able to have helpful assistants that talk to you, answer questions, and give advice. later you can have something that goes off and does tasks for you. eventually you can have something that goes off and discovers new knowledge for you.” — Sam Altman

Several applications in Natural Language Understanding use reinforcement learning, too. Here is an actual poem on Elon Musk and Twitter written by chatGPT — can you believe that a machine wrote this? It’s fun, isn’t it?

So far, reinforcement learning is considered the best or most powerful type of machine learning because it can learn from an uncertain environment and you can see how productive it is from the examples shown. This is a field that is growing radically and there are applications coming in that are more and more advanced and stunning. I hope this write-up gives you a basic idea of what reinforcement learning is. If it did, give me a “reward” and give this a like. Just kidding!

--

--

Lakshmi Prakash
Design and Development

A conversation designer and writer interested in technology, mental health, gender equality, behavioral sciences, and more.