Exploring the World of Pokémon Red with Reinforcement Learning

Kunwar Vikrant
4 min readNov 6, 2023

This post is based completely on the works of Peter Whiddy who posted an excellent tutorial on youtube explaining the steps. All credits goes to him for this excellent experiment

In this post, we delve into the fascinating world of reinforcement learning, to train an AI to play Pokémon Red. This AI starts with no knowledge of the game but learns and adapts over five years of simulated game time. We’ll explore the AI’s journey, its successes, and its failures, drawing parallels with human experiences. Furthermore, we’ll discuss the technical details behind this project and guide you on how to download and run the program yourself.

The AI interacts with the game just like a human player. It takes in screen images and selects which buttons to press to play the game optimally. Reinforcement learning, specifically proximal policy optimization, is used for training the AI. Instead of explicit instructions, the AI learns through trial and error, guided by high-level feedback.

Let the Games Begin

At the start, the AI randomly presses buttons. To provide meaningful feedback, a gentle curriculum of rewards is created. An important objective is to encourage exploration. The AI is rewarded for discovering new areas, which promotes curiosity and novelty-seeking behavior.

Exploration, Distraction

The AI’s early exploration behavior is somewhat paradoxical. It becomes fixated on certain areas, such as Pallet Town, as animations trigger the novelty reward repeatedly. This behavior reflects the human tendency to be lured by distractions during the quest for novelty.

Level Up

As the AI gains experience, it starts battling, capturing Pokémon, and leveling them up. The reward structure is adjusted to incentivize leveling, improving its ability to win battles and progress in the game.

PC Trauma

A significant issue arises when the AI encounters Pokémon Centers. It gets stuck due to a traumatic experience — the loss of a Pokémon during a PC deposit. This highlights how a single traumatic event can influence behavior. The reward function is adjusted to prevent further avoidance of Pokémon Centers.

Exploring Further

The AI makes progress, exploring new areas, and even buying Magikarp to gain levels efficiently. This behavior parallels human instincts related to scarce resources and adaptability.

Mount Moon Challenge

The AI encounters difficulties in Mount Moon due to the repetitive environment, leading to navigation issues. Despite its progress, it fails to complete this part of the game.

Map Visualizations

Various visualizations provide insights into the AI’s navigation behavior, highlighting its preference for walking counterclockwise on edges of the map. This demonstrates how simple strategies can help with limited memory and planning.

RNG Manipulation

The AI learns to manipulate the game’s deterministic behavior to catch Pokémon more efficiently. This reflects the advantage of understanding and exploiting deterministic elements within randomized systems.

Metrics & Visualization

To understand the AI’s behavior, a variety of metrics and visualizations are used. These include mapping the AI’s movement on the game map, tracking Pokémon caught, and analyzing the AI’s behavior over training iterations.

Future Improvements

Several potential improvements can be explored, such as transfer learning, environment modeling, and hierarchical reinforcement learning. These could enhance the efficiency and capabilities of reinforcement learning algorithms in the future.

To run the AI on your own machine, this post provides a step-by-step guide. It involves downloading the repository, obtaining the Pokémon Red ROM, installing necessary dependencies, and running the pre-trained model or training from scratch. The post also explains how to interact with and control the AI during gameplay.

Reinforcement learning, as demonstrated in this project, offers valuable insights into both AI and human behavior. As we journey through the world of Pokémon Red, we gain a deeper understanding of how an AI learns, adapts, and sometimes struggles, reflecting our own experiences and challenges. This project serves as a fascinating exploration of the possibilities and limitations of reinforcement learning in a complex gaming environment.

Why don’t Ghost-type Pokémon ever make good secret agents? Because they’re always ‘haunting’ the scene! :D

Thank you for reading, awesome reader! 🎉If you found my article useful, do give some claps!, Also subscribe to my newsletter to get your regular dose of mind-bending visuals delivered straight to your inbox. at kunwarvikrant.substack.com, let’s embark on this visual adventure together! 🚀

--

--

Kunwar Vikrant

Master of 0s and 1s, bending algorithms to my will. 💻✨ #AIJedi #CodeSorcerer