AZFour: Connect Four Powered by the AlphaZero Algorithm

Overview

In the process of developing GraphPipe at work, we selected a variety of use-cases to help us better understand the characteristics of distributed deep learning work flows. A particularly interesting test case was AlphaZero, which we decided to implement based on its need for large-scale distributed computation.

Game Controls

AZFour lets you play against a selection of neural networks that have had varying amounts of AlphaZero training. The UI looks like this:

Position Evaluation

The area immediately below the game board represents the neural network’s evaluation of the game board from the perspective of the current player. Famously, the AlphaZero algorithm trains its network with two outputs: Policy and Value, which are dsiplayed in the app like this:

  • Value Output: This is the neural network’s belief about what the game outcome will be. To make this value a bit more tangible for users, the UI represents this output as text (like ‘I am 71% confident yellow will win’).

Model Selection

In the model dropdown, there are a selection of models labeled as Generation 1 through 50, where higher generations correlate with better playing ability. For example, a model that was only trained with one batch of data (aka 1 Generation) plays more or less randomly, while a model trained for 50 Generations is quite strong. You can select a different model for each player using the Model dropdown.

Skill

You can control the skill of a particular model by using the Skill slider:

Auto-Play

If you select the Auto-Play checkbox, the computer will automatically play moves for that color. By default, Auto-play is checked for Red, which means that when the page loads you are playing as Yellow.

Hide Hints

If you think you can beat the computer without any AI assistance, feel free to Hide Hints. This will hide the move percentage hints, and you’ll be on your own!

Example

To illustrate how the network’s beliefs about board positions change as training progresses, consider the following position:

Learn More

To see exactly how I deployed the AZFour production site, read my next post.

Lessons From AlphaZero

  • Part 1: AlphaZero Basics
  • Part 2: ConnectFour+AlphaZero
  • Part 3: Parameter Tweaking
  • Part4: Improving the AlphaZero training target
  • Part5: AlphaZero Training Optimization
  • Part6: AlphaZero Hyperparameter Tuning

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store