A Game-playing Agent

6 min readApr 24, 2018

Research Review

The field of artificial intelligence is continually changing and advancing. To be an AI Engineer at the cutting edge of your field, you need to be familiar with good works published:

Title: ** Mastering the game of Go with deep neural networks and tree search **

0.1 Personal Reason to choose AlphaGod:

I have chosen AlphaGo by DeepMind Team because it is about game of Go and a computer program never defeated a human professional player in a full size game of Go.

0.2 Introduction:

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Deep Mind team introduced a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves.

0.3 Game of Go Difficulties:

Infeasible exhaustive search: In large games, such as chess (b35, d80) and especially Go (b250, d150), exhaustive search is infeasible.

0.4 DeepMind Approach:

DeepMind introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play.

Deep Mind also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks.

0.5 Methods used by Deep Mind

1. Monte Carlo tree search: estimate the value of each state in a search tree. As more simulations are executed, the search tree grows larger and the relevant values become more accurate. The policy used to select actions during search is also improved over time, by selecting children with higher values.

2. Policy network: Used Convolutional Neural Network to read the board as an image and finds the positions on the board, then using supervised learning from human moves to learn the game following by Reinforcement learning to improve the network.

3. Value network: evaluate board position.

0.6 Results:

1. ** By combining tree search with policy and value networks, AlphaGo has finally reached a professional level in go, providing hope that human-level performance can now be achieved in other seemingly intractable artificial intelligence domains **

2. ** AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away**

Synopsis

In this project, I have develop an adversarial search agent to play the game “Isolation”.

Isolation is a deterministic, two-player game of perfect information in which the players alternate turns moving a single piece from one cell to another on a board. Whenever either player occupies a cell, that cell becomes blocked for the remainder of the game. The first player with no remaining legal moves loses, and the opponent is declared the winner. These rules are implemented in the isolation.Board class provided in the repository.

This project uses a version of Isolation where each agent is restricted to L-shaped movements (like a knight in chess) on a rectangular grid (like a chess or checkerboard). The agents can move to any open cell on the board that is 2-rows and 1-column or 2-columns and 1-row away from their current position on the board. Movements are blocked at the edges of the board (the board does not wrap around), however, the player can “jump” blocked or occupied spaces (just like a knight in chess).

Additionally, agents will have a fixed time limit each turn to search for the best move and respond. If the time limit expires during a player’s turn, that player forfeits the match, and the opponent wins.

Instructions

Following functions have been implemented:

MinimaxPlayer.minimax(): Minimax search
AlphaBetaPlayer.alphabeta(): Minimax search with alpha-beta pruning
AlphaBetaPlayer.get_move(): Iterative deepening search
custom_score(): Alternative best position evaluation heuristic
custom_score_2(): Alternate position evaluation heuristic
custom_score_3(): Alternate position evaluation heuristic

The Project Assistant sandbox for this project places some restrictions on the modules available and blocks calls to some of the standard library functions. In general, standard library functions that introspect code running in the sandbox are blocked, and the PA only allows the following modules random, numpy, scipy, sklearn, itertools, math, heapq, collections, array, copy, and operator. (Modules within these packages are also allowed, e.g., numpy.random.)

Coding

Tournament

The tournament.py script is used to evaluate the effectiveness of custom heuristics. The script measures relative performance of my agent in a round-robin tournament against several other pre-defined agents.

The performance of time-limited iterative deepening search is hardware dependent (faster hardware is expected to search deeper than slower hardware in the same amount of time). The script controls for these effects by also measuring the baseline performance of an agent called “ID_Improved” that uses Iterative Deepening and the improved_score heuristic defined in sample_players.py. The goal is to develop a heuristic such that my agent outperforms ID_Improved.

The tournament opponents are listed below. (See also: sample heuristics and players defined in sample_players.py)

Random: An agent that randomly chooses a move each turn.
MM_Open: MinimaxPlayer agent using the open_move_score heuristic with search depth 3
MM_Center: MinimaxPlayer agent using the center_score heuristic with search depth 3
MM_Improved: MinimaxPlayer agent using the improved_score heuristic with search depth 3
AB_Open: AlphaBetaPlayer using iterative deepening alpha-beta search and the open_move_score heuristic
AB_Center: AlphaBetaPlayer using iterative deepening alpha-beta search and the center_score heuristic
AB_Improved: AlphaBetaPlayer using iterative deepening alpha-beta search and the improved_score heuristic

Game Visualization

You can follow the instruction on GitHub.

Results :

heuristic_analysis

Your analysis should conclude with a comparison of the different heuristics and your reasoning for choosing the heuristic you ultimately use in your submitted agent.

This script evaluates the performance of the custom_score evaluation function against a baseline agent using alpha-beta search and iterative deepening (ID) called AB_Improved. The three AB_Custom agents use ID and alpha-beta search with the custom_score functions defined in game_agent.py.

Each code Calculate the heuristic value of a game state from the point of view of the given player.

def custom_score(game, player):
    loser_winner(player,game)
    player_move = len(game.get_legal_moves(player))
    opponent_moves = len(game.get_legal_moves(game.get_opponent(player)))
    blank_spaces = len(game.get_blank_spaces())
    return float((blank_spaces-opponent_moves)*player_move-opponent_moves)
def custom_score_2(game, player):
    loser_winner(player,game)
    my_moves = len(game.get_legal_moves(player))
    opponent_moves = len(game.get_legal_moves(game.get_opponent(player)))
    blank_spaces = len(game.get_blank_spaces())
    return float(my_moves*(blank_spaces-1) - opponent_moves*(blank_spaces))
def custom_score_3(game, player):
    loser_winner(player,game)
    my_moves = len(game.get_legal_moves(player))
    opponent_moves = len(game.get_legal_moves(game.get_opponent(player)))
    return float(my_moves - 3*opponent_moves)

Tournament:

For each of your three custom heuristic functions, evaluate the performance of the heuristic using the included tournament.py scrip

The tournament opponents are listed below.

Random: An agent that randomly chooses a move each turn.
MM_Null: CustomPlayer agent using fixed-depth minimax search and the null_score heuristic
MM_Open: CustomPlayer agent using fixed-depth minimax search and the open_move_score heuristic
MM_Improved: CustomPlayer agent using fixed-depth minimax search and the improved_score heuristic
AB_Null: CustomPlayer agent using fixed-depth alpha-beta search and the null_score heuristic
AB_Open: CustomPlayer agent using fixed-depth alpha-beta search and the open_move_score heuristic
AB_Improved: CustomPlayer agent using fixed-depth alpha-beta search and the improved_score heuristic