Reinforcement Learning, can’t get any easier than this, Part 1

Hrithik Rai Saxena
5 min readMar 22, 2023

--

Just look around you. What do you see?

An environment. A place with tangible things around you. Suppose you are in a room, reading this article on your laptop and I barge into your environment, running toward you with a knife. What will you do?

I bet you will try to defend yourself or run within the confined boundaries of the room. This is you, interacting with the environment, aiming to get a reward (in this case, trying to successfully defend yourself from me) or getting a penalty (getting a nasty stab wound). This ability to interact with the environment to fulfill a task, riding through a series of penalties and rewards comes from our biological gift called the Brain, which in the case of humans, is densely packed with neurons, giving us some serious thinking capacity.

Now think…

Instead of you, there is a robot in the room. I am in the room with a screwdriver to unscrew the whole robot lose. But now the condition is that I will be standing still, at a fixed place in the room. The robot needs to find a safe route out of the room without running into me. How will it achieve this goal?

This is where the idea of Reinforcement Learning kicks in. In this article, we are going to come across the very fundamentals of RL and get you started with the thought process that goes into solving a problem using RL.

So, Reinforcement Learning is a field of artificial intelligence where we work on the intuition of — How would you act in an environment to maximize a given reward?

The reinforcement learning algorithms look out for the behavior of the subject in an environment and learn to optimize that behavior. Easy enough till now? Now let’s read some buzzwords that you are going to come across frequently when dealing with this subject. Some of the most common ones are Markov Decision Process, Discounting, Policy, Value Functions, Q Learning, DQNs, Policy Gradients, Bellman Equation, and some complex-looking math.

Do not get intimidated by all these words. Everything builds up on top of one another and is quite easy. You will soon understand why every buzzword is called what it is called and how the definition of Reinforcement learning will become more complex as we proceed. So, let’s eat the whole whale one bite at a time.

Why Reinforcement Learning?

So, before starting with our very first algorithm, let’s understand why it has been one of the most talked about Machine Learning trends over the last few years. The most relevant to us should be industrial automation with Reinforcement Learning. In industry reinforcement, learning-based robots are used to perform various tasks. Apart from the fact that these robots are more efficient than human beings, they can also perform tasks that would be dangerous for people. A great example is the use of AI agents by Deepmind to cool Google Data Centers. This led to a 40% reduction in energy spending. The centers are now fully controlled by the AI system without the need for human intervention. Other use cases include autonomous driving, predicting stock prices, gaming, large-scale production systems, healthcare, and much more.

So, all in all, RL leverages fast-paced trial and error to develop robust solutions. Let’s look at a basic algorithm to understand this better.

Markov Decision Process:

This is the bedrock for understanding Reinforcement Learning. Markov Decision Process gives us a way to formalize the process of sequential decision-making.

There is a decision-maker called the agent. This agent interacts with the environment it’s placed in. These interactions occur sequentially over time.

At each time step, the agent gets some representation of the environmental state. Given this representation, the agent selects an action to take. After the action is taken, the environment moves on to the next state and the agent gets a reward as a consequence of its previous action.

Markov Decision Process

So now you know that during the training part, the agent is purely dependent on the feedback signal (reward) to get an idea of how good its action was.

This is also followed by some complex-looking probability and math but let’s skip that for now and try to stick to the basics.

Training an agent vs Using it afterward?

Now the big question arises, how long should I train the agent and when do I start using it? Once you have created an environment and a reinforcement learning agent, you can train the agent in the environment. Training terminates automatically when the conditions you specify in the Stop Training Criteria are satisfied. For now, consider this stop criteria as an object which monitors the training of the agent until the agent starts performing as per our needs.

At this point, we save the agent along with the experience it has gained from all the training and then release it in an actual working environment. On the contrary, when the agent has been put to work after the training, the agent always decides on the action that promises the highest reward for him. In this phase, he does not learn anymore.

Later, when the environment changes, we need to retrain the agent to adapt to the new environment! Therefore, the experiences our agent goes through during its workday have to be saved as data for agent training!

This will be a three-part series on getting started with Reinforcement Learning. In later articles we will talk more about things like expected return, discounting, policy, and value functions but as I said before, just keep up with me, and even if the above stuff is a little over the head, skip it. These things will be repeated again and again in the further articles.

Till then,

Happy Learning.😜

--

--

Hrithik Rai Saxena

Hey there, I'm a machine learning engineer based in Germany.