Study on RL Algorithms with Snake Game implementation
Reinforcement Learning (RL)is an emerging area in the field of AI and its usage in main stream business applications are increasing at a breathtaking speed.
Concept wise, RL Algorithms could be broadly grouped into following categories and under each category, we have a multiple algorithms that help in optimizing the performance of the agent:
→ DQN Agents
→ Policy Gradient Agents
→ Hierarchical Agents
→ Actor Critic Agents
I am going to write series of articles in this space, where I would like to explain the ‘implementation’ aspects of these algorithm and help the readers in gaining the deeper understanding of the concepts.
With that as an aim, I would like to use ‘Snake Game’ problem as my environment and will build an agent with different RL algorithms. In this article, I have trained the snake agent with DQN and DDQN algorithms and the results are quite impressive even with training of 200 episodes. You can observe the results in below animations.
Functioning code of this application is available at this github repo.
You can find the part 2 of the series here, where I have compared the performance of Snake agent across multiple DQN algorithms and compared the results.
Most of us would be familiar with Snake Game and would have extensively played it during our school days — so I will have it quickly summarized and then get into implementation aspects.
Before that, if you are interested in RL concepts , you can refer to my previous articles, where I have written extensively written on the RL agent building across various use cases.
Snake Game — Overview & Environment definition
Snake game is a simple game, where the snake needs to find the food by navigating across the screen. If snake finds the food, it grows in size. If snake crashes into the boundaries or to itself, then the game ends. Fairly a simple game. Lets frame this problem into RL space.
Environment:
We are going to use Pygame as our library, where we will define our environment. It will be a window with 320x320 pixels in size and we will build the environment in such a way that when snake crashes on the boundary or to itself — then the game will end.
State:
State defines the current status of the snake in the environment and we are going to define the ‘state’ as a following 11 boolean vectors:
→ Is there an immediate danger next to snake (right, left or straight)
→ Direction of the snake movement (up, down, left or right)
→ Location of food with respect to snake’s head (up, down, left or right)
Actions:
Snake can take 3 actions to decide its course. It can go straight, turn right or turn left. Care needs to be taken in the implementation to consider the direction of the snake while applying the action. If a snake goes pointing in right direction, when it goes straight — its x-value increases. Whereas, if it goes in left direction and goes straight, its x-value decreases. So, we need to update the position of the snake considering the direction of the head along with the action it takes.
Reward:
We are going to keep a very simple reward structure where snake gets a reward of +10, when it finds a food and gets a reward of -10, when it crashes. This reward structure could be further fine-tuned and we can have them tried in the subsequent experiments.
RL Library:
As I said above, my intention is to use Snake game as an experiment to study different RL algorithms. I have come across this library, which I found to be highly comprehensive with regard to various RL algorithm implementations. Author has built the framework around RL algorithm implementations in such a way that you can train multiple RL agents with a simple command. So, big thanks to the author for creating such a wonderful library and in helping me study RL algorithms in a better way.
This library has been implemented to support various environments in OpenAI Gym library. I have made few changes in the library to support custom environment, as in this case.
Snake Environment Class Overview:
In the repository, you can find the SnakeAgent as a python class — which contains various methods required for the agent to learn, as below:
__init__ — This is a init method, where I define the various environment variables.
reset — This method would be called at the end of every episode and resets the snake to the starting position.
step — This is a very important method, which makes update to the snake position based on the action that it received from the agent. Depending on the direction of the snake, each of the action (go straight, turn right or turn left) will change the head position of the snake and its direction. Also, as snake game is implemented as part of the code, corresponding updates needs to be done along with ‘food eaten’ and ‘crash’ check.
If food has been eaten, snake grows in size and the agent gets a +ve reward. If snake crashes, then the game ends along with a -ve reward.
_get_obs — This method generates the ‘state’ value of snake’s position as per the above definition.
Configuration
There are 2 sets of configuration with regard to training here — one associated with training and other with regard to algorithm.
As far as training are concerned, we will be training the agent for 200 episodes and the first 50 episode, we will let the agent to learn via exploration (random actions).
With regard to algorithm, following are the configurations:
As I said above, this library has been built with host of RL algorithms — wrapped around the execution framework. You can train agent with different algorithms in a single command, as below:
Here, I have trained the agent with DQN and DDQN algorithms.
Results Overview:
In order to measure, whether the agent is learning or not — I would like to use 2 basic metrics, as below:
→ Score: No of food eaten by the snake
→ No of steps: No of steps that the snake took during the episode, without crashing.
Performance of Snake agent over 200 episodes look as below:
As you see, with DDQN algorithm — agent is able to score better even from ~60 episodes onwards. Whereas with DQN algorithm, agent has started scoring only after around ~120 episodes.
Also, you can observe during the initial 50 episodes (during exploration phase)— agent did not survive for long and crashed much early. Post 50 episodes, agent is learning gradually and its performance is also improving.
Also, Agent has scored up to range of 500 points and has persisted for more than 2500 steps in an episode. Agent is able to do so well with the training of just 200 episodes — which is really good.
Summary and next steps:
Thus, we are at the end of the article and we have seen in detail about how to build RL agent for Snake game. In this article, I have trained the agent in DQN and DDQN and presented the results. Keep following this space to check the future articles on how the agent could be trained using other RL algorithms and how does the results look like.
Thanks and happy learning!!!