AZFour: Connect Four Powered by the AlphaZero Algorithm

In this post, I give an overview AZFour.com, an online AI app that plays Connect Four. the AI was trained using AlphaZero, the game-playing algorithm created by DeepMind.

Overview

What exactly is AlphaZero? AlphaZero is an algorithm developed by DeepMind as a generalization of AlphaGo, the famous computer algorithm that beat the world’s best Go player. AlphaZero is particularly interesting because it can learn any perfect information game, such as Chess, Go, Reversi, and — you guessed it: Connect Four. And it is able to learn how to play these games without any knowledge about game strategy — all that it needs is the game rules (well, and and maybe a little parameter tweaking). But for the most part, the details about how to actually play the game are figured out by have the AI play thousands or even millions of games against itself.

As part of our open source release of GraphPipe, we provided pre-trained models for Connect Four. These are the models that constitute the AI behind AZFour.

If you are familiar with Connect Four, you should be able to start playing right away, but there are some additional controls on the page that are designed to help you explore the progression of learning throughout AlphaZero training.

Game Controls

Let’s look more closely at each of these game control components.

Position Evaluation

  • Policy Output: In AlphaZero, the policy output is the algorithm’s evaluation of the current move choices. In the AZFour UI, higher percentages are associated with moves that the algorithm thinks are best.
  • Value Output: This is the neural network’s belief about what the game outcome will be. To make this value a bit more tangible for users, the UI represents this output as text (like ‘I am 71% confident yellow will win’).

Model Selection

Skill

But what exactly does this slider do? What does ‘Skill’ mean?

When an AlphaZero network evaluates a position, its policy output generates a probability distribution across all possible moves. During typical competitive game play, the computer selects the argmax of this policy distribution as the next move.

However, to make AZFour more fun for humans (so that play varies from game to game, and so that the AI can show some weakness), the app is configured to select its next move probabilistically, but still based on the weights of the policy distribution. This is articulated with a setting called “Skill”, which mimics the temperature parameter (τ) of AlphaZero to control the shape of the policy distribution before selecting a move. To illustrate, here is the plot of the reshaped policy distribution after various Skill values have been applied to the policy output:

As you can see, the higher the Skill setting, the more Move Choice resembles argmax.

Auto-Play

You can play against the computer as either player. You can even select Auto-play for both players and watch the computer battle itself!

Hide Hints

Example

Using the Generation 3 model, Yellow has determined that its best move (with 29% certainty) is in the center position, which is clearly a losing move, since if red goes in either column 2 or 5 it will have 2 ways to win.

By generation 7, the model evaluates the position like so:

Now, the model correctly identifies that it needs to block the red player from constructing an open-ended 3-in-a-row position, but thinks that it does not have an advantage. Now let’s see what our best model thinks about the position:

Yellow still identifies that it should choose position 2 or 5, and has nearly eliminated any other move as a possible play. Further, it now believes that Yellow will win this game. Indeed, this is a winning board position for Yellow (after 19 moves).

Learn More

Want to see more escapades with the AlphaZero algorithm? Check out these posts:

Lessons From AlphaZero

  • Part 2: ConnectFour+AlphaZero
  • Part 3: Parameter Tweaking
  • Part4: Improving the AlphaZero training target
  • Part5: AlphaZero Training Optimization
  • Part6: AlphaZero Hyperparameter Tuning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store