Study on RL Algorithms using Snake Game Implementation — Part 2

Vasanthkumar Velayudham |
Analytics Vidhya
Published in
4 min readApr 24, 2023

This article is second in sequence about how to build Snake game using Reinforcement Learning (RL) algorithms. For those, who have not read the previous part — pls have it read over here:

To quickly summarise, in the previous part — we built a Snake RL environment and trained agents with DQN and DDQN algorithms. Most of the contents in previous article is about describing the code and environment. We did not delve deeper into algorithms and did not analyze the results.

In this article, we will train the agent and compare the performance of Snake agent across following 4 variations of DQN algorithms:

→ DQN

→ DDQN

→ DDQN with Prioritized Experience Replay

→ Duelling DQN

We will also discuss about each of the algorithms quickly while analyzing the results.

Lets get started!

Agent Training and Results Overview:

Running across multiple DQN algorithms is a breeze the with the framework and user just needs to select the appropriate algorithm and initiate the execution, as below:

Image from the src code

In order to run the agent with other algorithms, just update the ‘agent_objs’ array with the respective algorithm’s class name and initiate the execution. Right now, framework does not support the execution with multiple objects. So, pls have the framework run with individual algorithms.

Upon completing the execution across all the 4 mentioned algos, this is how the results look like:

Snake agent’s score across 200 episodes

To give a quick recap on execution parameters, we run the agent for 200 episodes, with first 50 episodes as fully random recommendations. Agent has to find the fruit and once it finds it will get 10 points. As you see agents have score more than 500 points, with in such a short run on 200 episodes.

When we plot the ‘Avg score’ of the agents between 51 to 200 episodes, this is how the scores look:

Avg score plot of Snake Agent across algorithms

As you observe, ‘DDQN with Prioritized Experience Replay’ algorithm has performed extremely well in comparison with other algorithms.

Fine lets understand what is ‘DDQN with PER’ algorithm and how it is helping the agent to achieve such good results.

DQN Family of Algorithm

Deep Q Networks based algorithms are based on Bellman equations, which approximates the ‘state-value’ function with a neural network.

For more details about how agent learns in DQN, pls refer the below article:

To quickly summarize, in DQN — network tries to solve the below Bellman equation with every step:

QLearning Target equation

where ‘r’ is the reward, gamma is the discount factor, s` is the next state, and a` is the most optimal action of the next state.

As you see, ‘max’ value present in the above equation is a reason for ‘maximization bias’ that causes DQN algorithm to overestimate the values.

You can find the impact of ‘maximization bias’ and how ‘Double DQN (DDQN)’ addresses this problem in below article:

Another fine variant of DDQN is ‘DDQN with Prioritized Experienced Replay’. As shown in the results above, DDQN with PER has given us the best results.

Assume, you are training a DQN agent and have collected around 10K steps till now. Each of the steps would have given you some rewards based on the action that you have suggested and correspondingly Q value would be computed. All these experiences get stored in Experience Replay and upon learning, system will randomly select a batch of training data to train the underlying agent.

Upon this random selection, there is a possibility that selected samples may not be very relevant for the network to learn. If there is a mechanism to select the most effective samples from the available set, then that would be much helpful for the network to effectively learn. Prioritized Experience Replay helps in doing exactly the same with the help of ‘td-error’ (temporal difference error).

Pls refer below article to get more understanding about ‘td-error’ and how PER addresses the overfitting problem associated with it.

Thus, we are at the end of the article — where we have seen how to train ‘Snake’ game with multiple variants of DQN algorithms and performed the results analysis.

Please let me know if you have any questions!

Also, pls follow me if you would like to get notifications on my future articles.

Thanks! Happy Reading!

--

--