AZFour: Connect Four Powered by the AlphaZero Algorithm


Game Controls

Position Evaluation

  • Policy Output: In AlphaZero, the policy output is the algorithm’s evaluation of the current move choices. In the AZFour UI, higher percentages are associated with moves that the algorithm thinks are best.
  • Value Output: This is the neural network’s belief about what the game outcome will be. To make this value a bit more tangible for users, the UI represents this output as text (like ‘I am 71% confident yellow will win’).

Model Selection



Hide Hints


Learn More

Lessons From AlphaZero

  • Part 1: AlphaZero Basics
  • Part 2: ConnectFour+AlphaZero
  • Part 3: Parameter Tweaking
  • Part4: Improving the AlphaZero training target
  • Part5: AlphaZero Training Optimization
  • Part6: AlphaZero Hyperparameter Tuning



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store