The Tic-Tac-Toe Game as the First Step in ML: A Tutorial for Very Beginners

9 min readJun 9, 2024

I just started my journey in the ML world, and after completing several courses on Udemy, I decided to train my first model. I chose Tic-Tac-Toe as the task because it’s a simple game with many possible board combinations. This game is an excellent fit for learning how ML works in practice and for going through all the steps from scratch to a working project.

In this article I will describe my steps and what problems I encountered and how I solved them. Perhaps this will be interesting for people who have also begun to study this area or for experienced engineers who can correct me in some way.

So, let’s start. We can break down the task into three main steps:

Preparing the dataset
Training phase
Inference phase

Data preparation phase

The best way to solve this is to find a suitable dataset on the web. I found several datasets but wasn’t satisfied with them. They weren’t clear enough, so I decided it would be good practice for my Python skills to generate it myself. My idea was to train a model step by step to choose the best next move for every stage of the game. Therefore, I needed not only the game result but also intermediate positions with the best next steps.

To achieve this, let’s create our dataset. Of course, we won’t play a thousand games manually but will generate a dataset of possible games automatically. The hardest part is calculating the best next move that will lead the computer to win. To solve this, I used the Minimax algorithm from Game Theory.

In short, the logic is to recursively calculate all possible moves from the current game phase and choose the best probability of winning or at least drawing.

I figured out an algorithm and implemented my variation of it in the small game on React.js applications to touch it and prove the concept of using Minimax for this task.

If anybody is interested there are the POC project and algorithm on Typescript.

Technically, with this code the main problem has already been solved, it is possible to play with the computer and it is difficult (even impossible) to beat it. But our goal is not just to make a bot for the game, but to train a model, so let’s continue :)

There is full minimax algorithm realization on Python code:

tic_tac_toe.py

import random
from typing import List, Union, Dict

# players
computer = "c"
human = "h"

# board types
Point = Union[int, str]
Board = List[Point]

# board
default_board: Board = [
    0, 1, 2,
    3, 4, 5,
    6, 7, 8
]


# check if player won
def is_win(board: Board, player: computer or human) -> bool:
    winning_combinations = [
        [0, 1, 2], [3, 4, 5], [6, 7, 8], # Rows
        [0, 3, 6], [1, 4, 7], [2, 5, 8], # Columns
        [0, 4, 8], [2, 4, 6]             # Diagonals
    ]
    for combination in winning_combinations:
        if all(board[i] == player for i in combination):
            return True
    return False


# check if board is full
def is_full_board(board: Board) -> bool:
    return all(point == computer or point == human for point in board)


def is_game_finished(board: Board):
    return is_full_board(board) or is_win(board, computer) or is_win(board, human)


# Minimax recursive algorithm to calculate value of a step
def calculate_value(position: int, board: Board, player: str, deep: int) -> int:
    expected_board: Board = board[:]
    expected_board[position] = player

    if is_win(expected_board, computer):
        return 10 - deep

    if is_win(expected_board, human):
        return deep - 10

    if is_full_board(expected_board):
        return 0

    next_player = human if player == computer else computer
    next_step = calculate_position_values(expected_board, next_player, deep + 1)
    entries = list(next_step.values())

    if next_player == human:
        return min(entries)
    else:
        return max(entries)


# check every possible steps
def calculate_position_values(board: Board, player: str, deep: int = 0) -> Dict[int, int]:
    result = {}
    for position, point in enumerate(board):
        if point == computer or point == human:
            continue
        result[position] = calculate_value(position, board, player, deep)
    return result


# choose step with higher scope
def choose_best_position(best_positions: Dict[int, int], random_best_position: bool) -> int:
    max_value = max(best_positions.values())
    best_positions_list = [position for position, value in best_positions.items() if value == max_value]

    if random_best_position:
        return best_positions_list[0]
    else:
        return random.choice(best_positions_list)


# get next best move
def get_tic_tac_toe_best_step(board: Board, random_best_position: bool = False) -> dict:
    best_positions = calculate_position_values(board, computer)
    best_position = choose_best_position(best_positions, random_best_position)
    return {"board": board, "bestPosition": best_position}

Now we have the code that helps to calculate the next step for every case on the game board. Let’s also create a function to generate a suitable dataset representing different game stages. We need random unique game boards at various stages and then use them with the function from the previous file.

generate_random_game_dataset.py

import pandas as pd
import random

from tic_tac_toe import human, computer, default_board, is_game_finished, get_tic_tac_toe_best_step


def generate_random_board():
    while True:
        # create base board
        board = default_board[:]

        # define count of human and computer steps
        num_h = random.randint(0, 4)  # random count for human
        num_c_options = [num_h, num_h - 1]  # possible count for computer before next step
        num_c_options = [x for x in num_c_options if 0 <= x <= 4]  # filter wrong steps
        num_c = random.choice(num_c_options)

        # fill game board with player steps
        positions = random.sample(range(9), num_h + num_c)
        for i in range(num_h):
            board[positions[i]] = human
        for i in range(num_h, num_h + num_c):
            board[positions[i]] = computer

        # check we have at least 1 free cell for next step
        if is_game_finished(board):
            continue

        return board


def generate_unique_boards(num_boards):
    unique_boards = set()

    while len(unique_boards) < num_boards:
        board = tuple(generate_random_board())
        unique_boards.add(board)

    unique_boards = [list(board) for board in unique_boards]

    # create DataFrame from uniq boards
    columns = ['point_1', 'point_2', 'point_3', 'point_4', 'point_5', 'point_6', 'point_7', 'point_8', 'point_9']
    df = pd.DataFrame(unique_boards, columns=columns)

    return df


def generate_random_game_dataset(count=1000):
    df = generate_unique_boards(count)

    best_steps = []
    for _, row in df.iterrows():
        board = row.tolist()
        best_step_result = get_tic_tac_toe_best_step(board, False)
        best_step = best_step_result['bestPosition']
        best_steps.append(best_step)

    df['best_step'] = best_steps

    df.to_csv("random_step_tic_tac_toe_games_dataset.csv")

    return df

Now if we call that code we will finally have 1000 unique rows of datasets and then we should analyze data to select the right machine learning algorithm with the next data structure.

You can see points from 1 to 9 as cells on the game board and the expected move in best_step column. The sign c means computer steps and h is the human as an opponent.

Training phase

There are three types of training for models: supervised, unsupervised, and reinforcement. As our dataset has features (current board stage) and labels (best moves), it will be supervised learning for our case.

It’s essential to choose the right training algorithm for your dataset. There’s no one-size-fits-all solution; you need to analyze your data and try different algorithms. Let’s start with RandomForestClassifier because it's relatively simple to implement for beginners. We just need to convert our data to integers and define the features and target for training.

There is our function to train the model and check its accuracy in the code below.

train.py

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

def train():
    df = pd.read_csv("random_step_tic_tac_toe_games_dataset.csv", index_col=0)
    print(df.shape)

    # convert string data to integer
    label_encoders = {}
    for col in df.columns[:-1]:  # all columns apart from 'best_step'
        le = LabelEncoder()
        df[col] = le.fit_transform(df[col].astype(str))
        label_encoders[col] = le  # save LabelEncoder for every column

    # define the features (X) and the target variable (y)
    X = df.drop(columns=['best_step'])
    y = df['best_step']

    # separate data to learning and test samples
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # create and train model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)

    # predict in test sample
    y_pred = model.predict(X_test)

    # value model acuraccy
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.2f}")

Pretty easy, but the accuracy of our model is only 0.56. This indicates there are some issues with the data or the training algorithm.

It is very important to fully understand your data and your issues in ML task. After the first training, I realized I didn’t prepare the proper best_step for every game board. Often, there isn’t only one successful move on a game board but several options. I mistakenly chose only the first optimal step, leading to inaccurate model predictions. As a result, the trained model can choose one of the equal optimal steps but it is rejected in tests because we expected another one of them.

To address this, let’s add a new function get_tic_tac_toe_best_step_list to include all possible best moves in the dataset:

....

# choose step with higher scope
def choose_best_position(best_positions: Dict[int, int]) -> List[int]:
    max_value = max(best_positions.values())
    best_positions_list = [position for position, value in best_positions.items() if value == max_value]
    return best_positions_list


# get next best step
def get_tic_tac_toe_best_step(board: Board, random_best_position: bool = False) -> dict:
    best_positions = calculate_position_values(board, computer)
    best_positions = choose_best_position(best_positions)

    if random_best_position:
        return {"board": board, "bestPositions": random.choice(best_positions)}
    else:
        return {"board": board, "bestPositions": best_positions[0]}

# get next best steps
def get_tic_tac_toe_best_step_list(board: Board) -> dict:
    best_positions = calculate_position_values(board, computer)
    best_positions = choose_best_position(best_positions)

    return {"board": board, "bestPositions": best_positions}

And now if we call generate_random_game_dataset from generate_random_game_dataset.py we will have all best moves in the last best_step columns and a new data structure.

Next, let’s modify our training script to account for all possible best steps:

train.py

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier

def train():
    df = pd.read_csv("optimal_steps_tic_tac_toe_games_dataset.csv", index_col=0)
    print(df.shape)

    # convert string data to integer
    label_encoders = {}
    for col in df.columns[:-1]:  # all columns apart from 'best_step'
        le = LabelEncoder()
        df[col] = le.fit_transform(df[col].astype(str))
        label_encoders[col] = le  # save LabelEncoder for every column

    # convert 'best_step' from string to list
    df['best_step'] = df['best_step'].apply(json.loads)

    # Expand the dataset to include all possible best steps
    expanded_data = []
    original_indices = []
    for idx, row in df.iterrows():
        for step in row['best_step']:
            new_row = row.drop('best_step').to_dict()
            new_row['best_step'] = step
            expanded_data.append(new_row)
            original_indices.append(idx)

    expanded_df = pd.DataFrame(expanded_data)
    expanded_df['original_index'] = original_indices

    # define the features (X) and the target variable (y)
    X = expanded_df.drop(columns=['best_step', 'original_index'])
    y = expanded_df['best_step']
    original_indices = expanded_df['original_index']

    # separate data to learning and test samples
    X_train, X_test, y_train, y_test, train_indices, test_indices = train_test_split(
        X, y, original_indices, test_size=0.2, random_state=42)

    # create and train model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)

    # predict in test sample
    y_pred = model.predict(X_test)

    # value model accuracy from the list of possible proper steps
    y_test_original = df.loc[test_indices]['best_step'].tolist()
    accuracy = sum([1 if pred in actual else 0 for pred, actual in zip(y_pred, y_test_original)]) / len(y_test)
    print(f"Accuracy: {accuracy:.2f}")

WOW! Now we have 0.84 accuracy with the same dataset size. It matches better so we went in the right direction by deciding to train our model using an array of best steps instead of the first step from possible ones. Great success!

It makes sense to think about the learning algorithm we chose and see if we can find a more optimal one for our task. For example, we can try Gradient Boosting. These algorithms often show high accuracy by constructing a sequence of decision trees, where each subsequent tree corrects the errors of the previous ones.

Let’s change the following line in our train.py:

model = RandomForestClassifier(n_estimators=100, random_state=42)

to this code:

model = XGBClassifier(n_estimators=100, random_state=42, use_label_encoder=False, eval_metric='mlogloss')

and then check what we get now.

It’s amazing! Now we have 0.94 accuracy. It is not perfect, but it is pretty good for the first time, and it makes sense to save our trained model. The next code includes saving the model and label encoders.

    # Save the model and label encoders
    joblib.dump(model, 'tic_tac_toe_model.pkl')
    joblib.dump(label_encoders, 'label_encoders.pkl')

And now it is done! We are ready to predict the best game steps on the real board.

Inference phase

To use the trained model, we can create a function from the following piece of code:

import joblib


def load_model_and_predict(new_board):
    # Load the model and label encoders
    model = joblib.load('tic_tac_toe_model.pkl')
    label_encoders = joblib.load('label_encoders.pkl')

    # Convert new board state to encoded form
    new_board_encoded = []
    for i, item in enumerate(new_board):
        col_name = f'point_{i + 1}'
        if col_name in label_encoders:
            encoded_item = label_encoders[col_name].transform([str(item)])[0]
            new_board_encoded.append(encoded_item)

    # Predict the best next step
    best_step_prediction = model.predict([new_board_encoded])
    return best_step_prediction[0]

Now we can call load_model_and_predict with the current game board and get the next best step for the computer. To see how it works, let's create app.py and make a simple web app to play against the computer using the trained model. I will not add that code here and just leave the link to the file.

Looks nice and works correctly. The task is finally done. It was a very interesting experience and good practice for beginners in this field. You can find all the project code here.

And of course the link on the deployed game.

Thank you for reading.

The Tic-Tac-Toe Game as the First Step in ML: A Tutorial for Very Beginners

Written by Artur Rieznik