Ruby Tic Tac Toe — Negamax with Alpha Beta Pruning

Dan Pelensky
7 min readJun 15, 2017

--

One of my challenges this week was to redo Tic Tac Toe in Ruby, this time implementing Alpha Beta Pruning to speed it up.

When discussing it with one of my mentors, I mentioned that I’d recently read that Alpha Beta Pruning was easier to implement using a derivative of minimax called Negamax. Negamax is a bit simpler than Minimax because it only focuses on the point of view of the player using it. If you are interested here’s my post on implementing Minimax in Java.

These algorithms work by analysing every single possible move that could occur on a given board, and minimising the chances of losing the game.

As you can probably imagine, analysing every single possible move in a game of tic tac toe is quite time consuming. Here’s where Alpha Beta Pruning comes in. Alpha Beta Pruning speeds things up by choosing to stop analysing a node if it finds that there is a worse possible outcome than one it has already calculated. If you’ve got 50 minutes I would strongly encourage you watch this video.

All credit goes to Rabea for recommending this video.

I’ve implemented Minimax twice now, and Negamax is a simplified version of Minimax, so I figured it would be quick and easy. I was wrong.

There are countless blogs that explain Minimax and how people have implemented them that I found to be extremely helpful :

I found extremely few resources on Negamax. The vast majority of my implementation came from the pseudocode in the wikipedia entry, as well as one the pseudocode in this helpful resource.

The blogs that I had used to come to understand Minimax provide real world code explanations. The Negamax resources used pseudocode in an abstract situation. It was significantly more difficult, but the victory of getting it to work is so much sweeter.

Here is my implementation in Ruby:

class PerfectComputer

attr_reader :marker

def initialize(marker)
@marker = marker
end

def choose_space(game)
@best_score = {}
negamax(game)
@best_score.max_by {|key, value| value}[0]
end

private

def negamax(game, depth = 0, alpha = -1000, beta = 1000, color = 1)
return color * score_scenarios(game, depth) if game.game_over?

max = -1000

game.board.check_available_spaces.each do |space|
game.board.place_marker(space, game.current_player.marker)
game.change_turns
negamax_value = -negamax(game, depth+1, -beta, -alpha, -color)

game.board.place_marker(space, space)
game.change_turns

max = [max, negamax_value].max
@best_score[space] = max if depth == 0
alpha = [alpha, negamax_value].max
return alpha if alpha >= beta

end

max
end

def score_scenarios(game, depth)
return 0 if game.game_tied?
return 1000 / depth if game.game_won_by?(self)
return -1000 / depth
end

end

1.1 initialize

attr_reader :marker

def initialize(marker)
@marker = marker
end

We instantiate the class with a marker — generally “X” or “O”, and store it as an instance variable to access later with an attribute reader.

2 choose_space

def choose_space(game)
@best_score = {}
negamax(game)
@best_score.max_by {|key, value| value}[0]
end

This is the polymorphic call that comes from the game class.

  • We create an empty hash and save it to an instance variable called best_score.
  • We call negamax passing in the current game object. This will check every possible outcome of the game.
  • When our recursive calls has finished, we look into the best_score hash for the highest value, and return the key, choosing the best possible move to not let our opponent win.

3.1 negamax — Initial Method Call

def negamax(game, depth = 0, alpha = -1000, beta = 1000, color = 1)

When we call negamax, we pass in:

  • the current game object
  • a depth of how far we are into our recursive algorithm (starting at 0)
  • an integer called alpha, starting at -1000
  • an integer called beta, starting at 1000
  • an integer called color, starting at 1

Alpha and beta represent lower and upper bounds for child node values at a given tree depth. The color indicates the current player, and will either be 1 for us, or -1 for our oponent.

3.2 negamax — Return a score when the game is over

    return color * score_scenarios(game, depth) if game.game_over?

At the very start of our recursive function, we have our base case, which is to return the result of score_scenarios if the game is over. If the game is not over we will move on.

NOTE: This is multiplied by the current value of color, so it is us making the move, it will be multiplied by one (making no change), and if it is our opponent we multiply it by -1, making a positive value negative, or a negative value positive.

4 score_scenarios — Calculate the outcome of our game

def score_scenarios(game, depth)
return 0 if game.game_tied?
return 1000 / depth if game.game_won_by?(self)
return -1000 / depth
end

Here’s where it gets fun.

  • If our game is tied, we return 0. We haven’t won or we haven’t lost.
  • If we have won, we return 1000 divided by the current depth. Remember, the depth is the number of times the recursive algorithm has been called, meaning the number of turns since we called it. If we can win on the next move we will have 1000, if it is in 3 moves it will be 333.
  • Our else statement is the exact same, just negative. The only time this else statement would be called is if our opponent wins (as we only enter this function if the game is over, and we’ve checked if the game is tied or if we have won)

Still following? Let’s jump back into our negamax method

3.3 negamax — Set Max

max = -1000

max is a variable we will use going forward. In the pseudocode I’d looked, it was negative infinity, but that wasn’t logical to me. I decided to make it -1000 to be easier to work with. This value will change very soon.

3.4 negamax — Try out every space on the board

  game.board.check_available_spaces.each do |space|
game.board.place_marker(space, game.current_player.marker)
game.change_turns
negamax_value = -negamax(game, depth+1, -beta, -alpha, -color)

game.board.place_marker(space, space)
game.change_turns

max = [max, negamax_value].max
@best_score[space] = max if depth == 0
alpha = [alpha, negamax_value].max
return alpha if alpha >= beta
end

I’ll break this down, but I want to show it all in one place at the start so you don’t need to keep scrolling up.

For every single available space on our board we do the subsequent steps.

3.4.1 negamax — Emulate a move

      game.board.place_marker(space, game.current_player.marker)
game.change_turns

We place a marker in one of the available spaces, and then change the turns. Exactly like a real player taking a turn.

3.4.2 negamax — Recursive call

      negamax_value = -negamax(game, depth+1, -beta, -alpha, -color)

The state of the board has now changed, so we call negamax again. This time; however, we do it a bit differently. Notice the following:

  • - sign in front of negamax, meaning we change the sign of the result
  • depth + 1 — we are adding one to our depth value, as we are going a layer deeper
  • - sign in front of both alpha and beta — we are changing the signs
  • alpha and beta have switched places; in our next call, our old alpha number will act as beta and vice vera
  • - sign in front of color — meaning this is a negative number, which makes sense as it is our opponents move

3.4.3 negamax — Put our board back to it’s original state

      game.board.place_marker(space, space)
game.change_turns

This will only happen once we’ve reached our base case; we haven’t actually made any moves, so we need to put the board back to it’s original state, space by space. We also make sure that it is still the correct player’s turn.

3.4.4 negamax — Compare max and negamax_value

      max = [max, negamax_value].max

We set our max variable to whichever is higher between our max value (previously set to -1000) or our negamax_value.

3.4.5 negamax — Save our score outcome to our best_score hash

      @best_score[space] = max if depth == 0

This is where the magic happens. When we are at our base case, add a key value pair of the space we were testing and our score outcome to our best_score hash. We draw on this in all the way in step one, when we want to choose our best score. This is where we set these scores.

3.4.6 negamax — Alpha beta pruning

      alpha = [alpha, negamax_value].max
return alpha if alpha >= beta

Here is where we get our huge speed gains. We save the higher value of alpha and our negamax_value to the alpha variable. If it is higher than beta, we return alpha and exit the function.

This is where we check if we so far have a worse outcome in our node — if we do, we stop calculating that node and move on.

3.5 negamax — Return max at the end of our method

  max

What a sad way to finish an exciting algorithm, though this is where our negamax_value that we’ve been referring to inside the loop comes from. This is the value that gets added to our key value pairs inside the best_score hash.

Breaking it down into these tiny steps it wasn’t that difficult, and everything was relatively obvious. I find recursive algorithms extremely difficult to troubleshoot. At one stage I drew out by hand exactly what was happening for one of my tests while it was failing.

I’m really happy that I didn’t give up and go back to Minimax which was very tempting, and am extremely grateful that one of the crafters here at 8th Light, Jarkyn, spent two hours of her valuable time pairing with me and helping me troubleshoot my code.

I genuinely hope this blog post helps someone. Please reach out if you want to discuss.

--

--