In my previous post, I mentioned that the AI of my Tic Tac Toe game relied on the function
choose-best-space to make a move. I also mentioned that the function simply chose a random number between 0 and 8. If that space was empty, it returned the random number. If it wasn’t, it tried again.
The great thing about having abstracted the AI logic out of the namespace responsible for selecting the AI move is that I don’t have to change a single line of code within that namespace in order to improve the AI. The
choose-best-space entry point remains the same. But instead of calling
choose-random-empty-space, it will now implement the minimax algorithm to pick an ideal move.
In essence, the minimax algorithm treats a game as a tree. Every node represents a possible move, and every level represents a different turn, alternating between the AI and the human player. It will traverse the tree until it reaches a branch: the end of a game. Once it does, it will score the game. If the AI wins, it gives the move a positive score; if the human wins, it gives the move a negative score; and if there is a a tie, it gives the move a score of zero. When the parent node is an AI move, that node will choose the child with the largest score. When it is a human move, it will choose the child with the lowest score. It behaves this way because it assumes that the human player will behave logically and also aim to win.
The most difficult thing for me to grasp about writing minimax in a functional language was that I wouldn’t be able to store or iterate through the scores in order to pick the highest or lowest score. Additionally, when breaking up the algorithm into manageable components, it didn’t feel completely intuitive where the recursion should happen. However, the approach to creating the algorithm became more to clear to me as I began to pseudo code each piece.
At its core, the minimax algorithm returns a score for every possible move. The score alone isn’t too useful. You’ll also need to know the corresponding space on the board in order to pick either the highest or the lowest scoring space. Since we’ll be mapping spaces to scores, we can assume that our algorithm will return a map, from which we’ll want to select index with the highest score.
I’m going to say this a few times in this post, but what’s happening inside
SOME-MAP doesn’t really matter yet. Once we can expect what will be returned, we can begin to act on that hypothetical return. In the snippet above, we’re going to take the map that’s going to be the end result of the entire minimax algorithm, and we’re going to:
- Sort it in descending order
- Select the pair with the highest value (i.e. the first pair)
- Return the key from that pair
Now that we know what we’re doing to our map, we can start creating our map. Basically, we want the empty space indices as keys and their scores as values.
zipmap function takes two arguments: a collection of keys and a collection of values. The
empty-indices (line 3) make up the keys. We know that the values will have to be a collection of scores for every possible move. For now, it’s not important to know how we’re going to score every individual move. It’s just important to know that we need to score multiple moves, so we’ll name our function
score-moves is going to apply a scoring function on every possible move, we should immediately be thinking about mapping. It’s still not important to know exactly how we’re going to score every move, but we do need to think about the prerequisite to scoring: simulating a move.
The function is going to simulate marking the board at a different available space each time. In an object-oriented language, you might instantiate a board identical to the one passed in, simulate the move, score the move, delete the simulated move and start over until you’ve covered every possible move. In a functional language like Clojure, you can’t mutate the board. Instead, you’ll be passing in a new board (the result of
board/mark-board on line 3 above) to each invocation of
score-move. This keeps us from having to worry about resetting the board to its original state every time you try to score a different possible move. Our second argument to the
map function is
empty-indices, which will replace the
% on line 3 above during each “iteration.”
I’ll return to the meaning of the
win on line 5 (above) a little further down.
In order to understand how to score a move, we have to clarify what we want the AI to do. We want the AI to win the game, so we’re not going to stop checking the status of a game until we get to the end of the game. This should sound like recursion with a base case because it is!
In this function, the end of the game is the base case. There are three possible outcomes to any game of Tic Tac Toe: you win, you lose, or you tie. The function evaluating these results must score each accordingly.
evaluate-result, we finally make use of the
win we passed in to
score-move. In line 5 (above), we return the score passed in as an argument if the game results in a win for the current player. Why not just return the number 1? Well, we don’t always want a positive score when the current player wins. Remember, the purpose of minimax is to minimize the opponent’s maximum possible outcome.
We don’t want to give the human opponent’s simulated win a positive score. Instead, we want a positive return value on an AI’s simulated win and a negative return value on a human player’s simulated win. Earlier, we fed in
win (or 1) because the first simulated move will always be the AI’s move. If that simulated move results in a win for the AI, we want to score that move positively, so we return the score that was passed in. If it results in a loss, we want to score the move negatively, so we multiply the move by negative 1 (or
loss). If the game results in a tie, we can always score it at zero regardless of whose turn is being simulated.
To keep our code DRY (and to allow the precise scoring to be tweaked in one place), we’ll define
win with the scores above.
Now, what if the game isn’t over after a simulated move? Then we’ll have to simulate the next turn and score each move in that turn.
If the base case isn’t met, our function will call itself recursively, but the next time it does, it will “think” like the human. In order to think like a human, it’ll have to change its marker (e.g. from O to X) and alter how it will score a human win (from 1 to -1).
We’ll want the move simulation to return a new board for us to score, since
score-move is expecting a board as its first parameter. Luckily,
board/mark-board returns a new board.
As above, we need to ensure that we use the opponent’s marker to mark the board during their simulated move, so we’ll once again call
get-opp-marker when we mark the board.
After we’ve simulated the opponent’s move, we’ll want to score that move, keeping in mind that we’ll want to negatively score their victory and positively score their loss. We can achieve that goal by multiplying the current score by a negative number (
loss on line 7 below). Because the function will be called recursively, it will alternate between scoring victories (and losses) positively and negatively.
Putting it all together, that leaves us with the code below:
There are a few things to note. Because
simulate-next-move depends on
choose-best-space— and because we can’t place simulate move after
choose-next-space without causing a compiling issue, we’ll have to forward declare the function on line 17 above. Additionally, we define
(memoize score-move) in order to take advantage of memoization to speed up our algorithm. Without memoizing the algorithm, you’d likely see a delay of a few seconds during the AI’s first move (when it has to traverse 8 levels of turns on the tree).
If you made it this far, here’s a picture of a cat to make up for nothing but code gists so far.