How I Coded an AI to Never Lose a Game of Tic Tac Toe: MiniMax Explained:
A couple weeks back, I had the assignment of creating a “terminal game” in my python course. A terminal game is simply a game that is run on a terminal, such as Command Prompt or Git Bash. Up to this point the course covered the fundamentals of python, including classes and methods. My idea was to create an artificial intelligence to never lose a game of tic tac toe (view the code here).
Tic tac toe is a board game that has two players, let’s name the players Alice and Bob. The board itself consists of a 3x3 grid. Play begins with Alice, who places an “X” in one square of her choice. Then Bob places an “O” in another square of his choice. Play continues until there are either no more spaces to put any markers or a player successfully creates “three-in-a-row,” as demonstrated in this gif.
I wanted to not only write a tic tac toe AI that was unbeatable, I wanted to better understand the algorithms used to solve games in general. This algorithm is what we call MiniMax.
Setting everything up:
In my code, I created a class called Tictactoe
. In this class, I needed to do the following:
- Create a 3x3 grid
- Grab a player input
- Calculate the optimal move using AI and play it
- Determine if there are any winners
I first started with the grid. I decided to use nested arrays for this step to organize and differentiate between columns and rows. I also created a “simulation grid” that will be used later on by the AI. In order to represent my grid to the player, I needed to format the grid to look like a grid. This is what I settled with:
I had the idea to add a dynamic difficulty system into the AI, and in order to best solve this, I added a parameter of anger
to the initialization function of class Tictactoe
. I then added different methods to print the grid, count how many squares are empty, and list all possible moves. I then also added a randomizer on who should go first using random.sample(['X', 'O'], 2)
which assigns the computer with X
50% of the time.
I then needed to grab a player input, which means I needed to decide on the input system for each square. I ultimately labeled the columns as A, B, and C, and labeled the rows as 1, 2, and 3. If the player wished to play the middle box, he would enter b2
into the terminal. I then parsed the string and made sure the move was valid. If it was, then it would be put in the main grid and displayed to the player.
Building the AI:
I now needed to create the AI that can never be beaten. To do this, I needed to employ the MiniMax algorithm. MiniMax essentially simulates every single possible game until a tie or a winner. Let the number of empty squares be E
: if the game is a tie (E=1
), the score is set to 0. If the computer wins, then the score is set to E+1
. Finally, if the player wins, the score is set to -(E+1)
. Let’s suppose a part of the branch looks like this:
For each game the AI will choose which game is more favorable for the computer. This is done with two different approaches:
- If it is currently the computer’s turn, the AI will choose the branch with a greater total.
- If it is currently the player’s turn (in the simulation, of course), the AI will choose the branch with a smaller total
Using these two rules, MiniMax makes the best choice for the game assuming that players play perfect. In the tree above, play starts with the computer and ends one turn later. Therefore, looking at the bottom-most layer, the computer will choose the smaller of the two values, since it is the player’s turn. On the left hand side the computer picks 3
as the smaller total, and on the right the computer picks 2
as the smallest total.
Backing up one step, it is now the computer’s turn, so the computer is looking for the largest total. Therefore, the computer chooses 3
. If this was the top of the tree, then the computer would ultimately return that path as being the optimal path and play that move, whatever that move may be. Of course, tic tac toe has only tens of thousands of possible games and is not too hard of a game for a computer to handle, but this method of searching is used all the time in much more complex games, such as Connect4, Chess, and Backgammon.
def minimax(self, player): #where the magic happens...
max_player = self.computer #computer is always the max player
other_player = 'O' if player == 'X' else 'X'
if self.find_winner_simul() == other_player:
#winner is found
return {'position': None, 'score': 1 * (self.get_empties() + 1) if other_player == max_player else -1 * (self.get_empties() + 1)}
elif self.get_empties() == 0:
#tie
return {'position': None, 'score': 0}
#reset best dictionary
if player == max_player:
best = {'position': None, 'score': -math.inf}
else:
best = {'position': None, 'score': math.inf}
for i in self.get_inputs_simul():
#simulate every game
self.input_move_simul(player, i[0], i[1])
sim_score = self.minimax(other_player) #recursion
self.grid_simul[i[0]][i[1]] = ' '
sim_score['position'] = i
#is our move better than the current best?
if player == max_player:
if sim_score['score'] > best['score']:
best = sim_score
else:
if sim_score['score'] < best['score']:
best = sim_score
return best
After implementing the AI, play looks like this:
Putting it all together:
Now that I had the AI that could never lose a tic tac toe game, I first programmed an option to play the AI with no hindrances, having the AI choose the best move every time. That wasn’t that much fun on its own, however. I needed something that you can occasionally beat, something with varying difficulty…
Implementing difficulty was very simple. Both the player and computer have 100 hp to begin with. The computer then calculates the difficulty level by subtracting the amount of hp from 100 (e.g. if the computer was at half health, the difficulty would be 100–50 or 50). Then the computer randomly generates a number between 0 and 99 inclusive using the following code:
def rand_chance(self):
randints = range(100)
randchance = random.choice(randints)
return randchance
During play, every turn will call this function. If the randomly generated number is greater than the overall difficulty level, then instead of using MiniMax, the computer will just make a random move. If it’s smaller, then the computer will run the MiniMax function.
If the player wins a game, the computer will lose 10 hp and the player will gain 10 hp back (unless the player hp would exceed 100, then it would just be set to 100 instead). However, if the player loses, they would lose a staggering 34 hp! If a tie does occur, the computer loses 2 hp, but the player loses 10 hp. Whoever’s hp is 0 or less loses the entire game. View the whole code here:
In conclusion, MiniMax searches every single possibility a game can undergo and, assuming both players play perfectly, chooses the move that will maximize its win potential. This AI algorithm is used, not only in tic tac toe, but in other more complex games, such as chess, go, or backgammon. I hope you enjoyed this read, have a wonderful rest of your day/evening!