Mastering Atari Games with Q-Learning 🎮

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

3 min readMay 6, 2024

Atari games have been a benchmark in the world of reinforcement learning (RL) since the groundbreaking work of DeepMind’s AI agents, which demonstrated superhuman performance across a range of these classic video games. One of the key algorithms behind this success is Q-learning, a fundamental RL technique that learns to make decisions by estimating the value of taking a particular action in a given state.

Understanding Q-Learning 🧠

Q-learning is a model-free reinforcement learning algorithm that learns to optimize a policy by iteratively updating its action-value function 𝑄(𝑠,𝑎). This function estimates the expected cumulative reward of taking action 𝑎a in state 𝑠s and then following the optimal policy thereafter. The update rule for Q-learning is based on the Bellman equation:

Q-Learning on Atari Games🚀

Applying Q-learning to Atari games involves representing the game state and actions, updating the Q-table, and selecting actions based on an exploration-exploitation strategy.

Let’s look at an example implementation of Q-learning on the Atari game “Assault”:

import ale_py
import numpy as np
import pygame
import matplotlib.pyplot as plt

# Initialize Pygame
pygame.init()

# Define the path to your ROM file
rom_path = r"D:\RL\environments\assault.bin"

# Create an ALEInterface instance
env = ale_py.ALEInterface()

# Load the ROM file
env.loadROM(rom_path)

# Get the number of actions
num_actions = env.getLegalActionSet()

# Q-learning parameters
alpha = 0.2  # learning rate
gamma = 0.96  # discount factor
epsilon = 1.0  # exploration rate (starting value)
epsilon_min = 0.1  # minimum exploration rate
epsilon_decay = 0.995  # decay rate for exploration

# Screen dimensions
SCREEN_WIDTH, SCREEN_HEIGHT = 160, 210  # Assault game screen size

# Set up the display
screen = pygame.display.set_mode((SCREEN_WIDTH, SCREEN_HEIGHT))
pygame.display.set_caption("Assault Game")

# Number of epochs
epochs = 50

# Initialize Q-table
q_table = np.zeros((len(num_actions),))

# List to store Q-values for plotting
q_values = []

# Main loop
for epoch in range(epochs):
    state = env.reset_game()
    total_reward = 0

    while not env.game_over():
        # Get current state
        state = env.getScreenRGB()

        # Epsilon-greedy policy for action selection
        if np.random.rand() < epsilon:
            action = np.random.choice(num_actions)
        else:
            action = np.argmax(q_table)

        # Take action and observe reward and next state
        reward = env.act(action)
        total_reward += reward

        # Update Q-value
        next_state_max_q_value = np.max(q_table)
        q_table[action] += alpha * (reward + gamma * next_state_max_q_value - q_table[action])

        # Decay epsilon
        epsilon = max(epsilon_min, epsilon * epsilon_decay)

    print("Epoch:", epoch + 1, "Total Reward:", total_reward)

pygame.quit()

# Save Q-table
np.save('q_table.npy', q_table)

Understanding the Code🔍

Initialization: The code initializes the Atari Learning Environment (ALE) and sets up necessary parameters such as learning rate (𝛼α), discount factor (𝛾γ), and exploration rate (𝜖ϵ).
Main Loop: The main loop runs for a predefined number of epochs. In each epoch, the agent interacts with the environment. At each step, the agent selects an action based on an epsilon-greedy strategy, updates the Q-table, and decays the exploration rate.
Action Selection: The agent selects actions using an epsilon-greedy policy, which balances exploration (taking random actions) and exploitation (taking actions based on current knowledge).
Q-Value Update: After taking an action and observing the reward, the agent updates the Q-value for the current state-action pair using the Q-learning update rule.
Epsilon Decay: The exploration rate (𝜖ϵ) decays over time to gradually shift the agent’s focus from exploration to exploitation.
Saving Q-table: Finally, the Q-table is saved for future use.

Conclusion🎉

Q-learning provides a powerful framework for training agents to play Atari games. Through iteratively updating Q-values based on observed rewards, the agent learns to make optimal decisions in complex environments. This example demonstrates the core concepts of Q-learning and how they can be applied to master Atari games. With further refinement and optimization, Q-learning continues to be a cornerstone of reinforcement learning research and application.

Here’s the GitHub link for the code and assault environment : https://github.com/Tejas-358/Q-Learning-Assault

Mastering Atari Games with Q-Learning 🎮

Understanding Q-Learning 🧠

Q-Learning on Atari Games🚀

Understanding the Code🔍

Conclusion🎉

Written by AI_Pioneer