Tech for Good: Advancing Math Education Through AI and Enhanced UX

Leveraging Machine Learning and AI to Transform Educational Experiences and Outcomes

Yu-Cheng Tsai

Published in

Sage Ai

12 min readJul 3, 2024

Coauthors: Bil Arikan, Daphne Turner, Sarah Castorillo, Juma Zevick, Lo Cianflone, LeTisha Campbell, and Anne Seidel

The Team

Introduction to the Hackathon

The Sage Foundation organized an exhilarating three-day hackathon aimed at providing solutions to enhance STEM education through the improvement of the STACK system (System for Teaching and Assessment using a Computer Algebra Kernel), an education technology tool. Participants from diverse skill sets and backgrounds came together with a common vision: to tackle pressing challenges in mathematics education and deliver viable solutions that can significantly benefit students and educators alike.

First Challenge: Develop an Adaptive Learning Algorithm

The goal was to craft a personalized learning experience that meets students at their level, gradually improving learning through practice and feedback within the STACK system and making learning more engaging. In the end, students need to feel their progression while educators see the results of tailored education strategies.

This challenge involved the creation of a machine learning algorithm capable of adapting a student’s learning plan. The algorithm should recommend STACK questions tailored to individual students based on their historical learning curves.

Second Challenge: Design a User-Friendly User Interface for Educators

The second objective was to develop a streamlined UX interface that would allow lecturers and teachers a quick and insightful overview of how students engage with the STACK system. The interface needed to be intuitive enough for educators to gain actionable insights without the burden of navigating complex data extraction, analysis, and visualization processes.

The ideal solution would provide educators with a dashboard that highlights key metrics of student engagement, grades, and areas needing improvement, thereby enabling timely feedback and adjusting of learning strategies.

Overview of STACK

STACK is a tool designed to enhance the teaching and learning for STEM education. This tool is designed to streamline the creation and assessment of mathematics questions for education by allowing automated grading and immediate feedback to students.

STACK’s features include:

Automatic Grading: Uses computer algebra systems (CASs) to automatically grade student responses based on the expected answers from teachers and students’ answers.
Immediate Feedback: Provides instant feedback, helping students learn from their mistakes in real time. The feedback can be either specific to the mistakes made by students or general comments, depending on how it is setup.
Question Customization: Allows for highly customizable questions tailored to various learning objectives, including practice and mastery exercises, assessment quizzes, and content-driven quizzes of various types.
Integration with Moodle: STACK integrates as a plug-in with the Moodle Learning Management System, making it accessible for educators and students.

In African undergraduate level classes, where the student to lecture ratio is high, particularly in STEM disciplines, STACK is pivotal in addressing the challenge of regular assessment and providing timely feedback. STACK is also used in other contexts beyond African universities, showcasing its relevance and broad adoption in various global institutions and the ability to meet diverse local learning needs.

Despite its extensive use, STACK faces significant limitations in its current form:

Lack of individualization: STACK provides the same learning materials (e.g., questions, videos, textbooks, etc.) to all learners, regardless of individual differences in learning needs. This approach can lead to either cognitive overload for some students who may find the instructional materials either too hard or to lower engagement for students who find the work too easy.
Inefficiency: The varying pace of learners is not considered. Some learners may need more time to grasp certain concepts, while others could advance faster if provided more challenging material.
Absence of personalized feedback: There is no mechanism within STACK technology to offer personalized learning plans or feedback based on individual student profiles.

Proposed Solution

To address these challenges, we propose the development of a holistic solution that includes two key components: a user-friendly Graphical User Interface (GUI) and a machine learning model. The integration of these elements would provide a tailored educational experience for each student, thereby enhancing learning efficiency and effectiveness, and transforming STACK into a more adaptive and responsive education tool.

User Experience Research and Proto-Persona Development

To truly understand the needs of educators and ensure our solution effectively addresses the challenges they are facing using the Moodle and STACK system, we needed to create a problem statement and proto-persona that would represent our target user group. The time limitations of the hackathon required us to be agile, and create an ad-hoc proto-persona rather than a full-blown user persona.

By creating a proto-persona and empathy map, our team was able to keep our target user at the center of the design process. These tools gave us a deeper understanding of user goals, pain points, and workflows which allowed us to make informed decisions about the design and functionality of our solution.

How did we create a proto-persona? We took what we already understood about our users and used that information to curate a list of interview questions. We then connected with educators who are currently using the system and educators using other LMS models. These interviews gave us valuable insights into the day- to- day workflow of our educators and which data is most important to them, what they believe would be useful, and the level of their experience and usage of current systems and their competitors.

We conducted empathy mapping sessions to step into the shoes of our users, which gave us a deeper understanding of our user’s perspective. We also conducted competitive analysis of alternative LMS models better suited to use case and visualization. Finally, we analyzed all of our research data and created our final user profile.

Problem Statement & Research Planning

Meet Danilo: Proto Persona & Empathy Mapping

Danilo is a math teacher who tried to analyze and understand how students interact with e-assessment tasks using the STACK system. This limitation hindered his ability to identify patterns and trends in student performance, which is crucial for making informed decisions to enhance his teaching methods and strategies. Therefore, there was a pressing need for the development of an intuitive GUI or dashboard that simplifies data analysis and presentation, empowering him to gain valuable insights into student performance and adapt his instructive approaches accordingly.

Designing a User-Friendly Graphic User Interface

One of the major challenges that educators face is that, even though the data is available to them, it requires an extensive amount of time to analyze it and extract relevant information from it. Also busy educators might not necessarily have the technical know-how to perform data analysis. The goal then is to provide them with a tool that could do these tasks automatically and easily with very little training required.

Understanding the User Experience

Armed with our proto-persona, we started mapping out the educator’s interaction flow with the Moodle and STACK systems. We used our interview data and our empathy mapping information and logged into Moodle. As we navigated the site, we identified critical touch points and the accompanying sentiments and areas for improvement.

A typical workflow of a teacher using Moodle:

Login: Teachers start by logging into their custom Moodle site and accessing the dashboard which offers an overview of classes, course units, and recent activities.
Viewing Student Quizzes: From the dashboard, teachers can access a course and navigate to the units which then takes them to the quizzes. This section includes tools for quiz management and tracking student submissions.
Viewing Performance Reports: Teachers can use the reports section to view detailed reports on student’s overall performance on quizzes. These reports provide performance metrics that help identify trends and areas where students can improve.

Prototype with Interactive Demo

We created a prototype with Figma to illustrate the user interface’s future version. This prototype includes an interactive demo that demonstrates the complete user journey from the teacher’s point of view. It enables educators to navigate seamlessly inside and outside the Moodle classroom, accessing all the necessary data and tools in one integrated interface.

How educators can benefit from the new (GUI) Graphical User Interface:

Ease of Use: Simplifies data access and interpretation, significantly reducing the amount of time educators spend searching for and compiling information.
Actionable Insights: Delivers clear, helpful insights into student performance that enables educators to give timely feedback and adjust learning strategies.
Enhanced Engagement: Improves the way educators interact with the STACK systems assessment data, fostering a better teaching and learning environment.

Proposed User-Friendly Graphical User Interface

Starting from the Moodle Classroom, we’ve added a link to quickly access the quiz analytics dashboard.

Teachers would then upload the quiz data from the course unit and QuizIQ would generate a list of charts, as well as, actionable insights.

(Left) User uploads quiz data (Right) Insights are being generated

The default view shows the ‘Quiz Analysis’ section. This section provides a comprehensive analysis of how the quiz difficulty and participation relate to each other, and which questions were the most challenging for students.

Various metrics are presented in the dashboard

The teacher could also navigate to the student analysis section, and find personalized recommendations for each of their students, along with the details of how these recommendations were generated.

Personalized recommendation for each student

By presenting this future state through a detailed Figma prototype, we were able to demonstrate how our proposed user-friendly GUI (Graphic User Interface) could transform the educational experience for both teachers and students. The interactive demo emphasized the potential for seamless integration and improved usability, ensuring that educators can focus more on teaching and less on navigating complex systems.

Data Exploration

In our journey to develop a machine learning model that personalizes quiz recommendations for students, we started with an in-depth exploration of the dataset available to us. We carefully reviewed the dataset and looked for features that would provide insight to better approaches.

The dataset is comprised of records of students’ exam attempts, each uniquely identified by their email address. Each exam consists of several questions, alongside student’s respective responses and scores for those questions. To ensure a fairness and diversity in testing, each question is tagged with a seed_id, which changes with every new attempt. Additionally, the dataset includes the duration of time each student spends on an attempt, the current status of the exam (whether completed or still in progress), and the student’s overall grade for that attempt.

For each question among students, we analyzed the distribution of scores, which helped us determine the level of difficulty for each question. This data was critical for our learning plan recommendations.

Left) Grades distribution among students for a single exam (Right) Score distribution for one question among all students. This is an easy level question considering a high number of students receiving full score.

Distribution of Time Taken (mins) v.s. Grades

We observed that the time taken for each exam did not correlate with the grades of students. This insight led us to decide that we needed to exclude the time in determining the difficulty level for exams.

Heat-map of grades for a pair of student ID and quiz ID

To gain a clearer understanding of how each student performed on different quizzes, we employed a heat-map as a visualization tool. This heat-map effectively displayed the grades for each pair of student IDs and quiz IDs. It provided a representation of performance patterns, highlighting areas where students excel and where they struggle. We noticed that more than 85% of the student IDs and quiz IDs pairs were empty. Later, building on this visualization, our recommendation engine would step in to enhance the learning process further by predicting scores for the empty white spaces on the heat-map — these represent quiz and student pairings that have not previously occurred.

Recommendation Engine

Our primary objective was to enhance student learning by personalizing the educational experience. We aim to achieve this by recommending questions tailored to each student based on predicted scores derived from the recommendation engine. The data was used to train the engine and was comprised of student emails, question identifiers (seed_id), and their corresponding scores, graded on a scale of 9.00.

from surprise import Dataset, Reader
from surprise import SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

reader = Reader(rating_scale=(1, 10))
grade_column = [col for col in combined_df.columns if col.startswith("Grade/")][0]
data = Dataset.load_from_df(
    df_melted_scores[["Email address", "seed_id", f"{grade_column}"]], reader
)
trainset, testset = train_test_split(data, test_size=0.25, random_state=42)
# Use SVD (Singular Value Decomposition)
model = SVD()
# Fit the model to the training data
model.fit(trainset)

# Predict ratings for the testset
predictions = model.test(testset)

# Compute and print Root Mean Squared Error
accuracy.rmse(predictions)


# Function to predict the score for a given user_id and item_id
def predict_score(model, user_id, item_id):
    prediction = model.predict(user_id, item_id)
    print(f"Predicted score for user {user_id} and item {item_id}: {prediction.est}")

To predict these scores, we used Singular Value Decomposition (SVD), an algorithm known for its effectiveness in extracting latent features from datasets. We used the grades for each pair of student ID and quiz ID as a dataset. The dataset was split into training and testing sets, known as utility matrices from the recommendation literature. Root mean squared errors (RMSEs) are used to compute the loss using the predicted grades and actual grades. We had 900 students and 150 quizzes, yielding 1.88 RMSEs in testing on average.

For example, consider a scenario where we need to predict the performance for the student with the email abc@mail.com on question 1. Our model successfully predicted a score of approximately 4.95.

Our recommendation strategy didn’t simply focus on delivering the top questions (top n) that a student is likely to perform well in; it also ensured a diverse set of challenges. This diversity was crucial as it exposes students to a variety of questions that can holistically test and improve their knowledge and skills across different areas.

# Function to get top N recommendations for each user
from collections import defaultdict


def get_top_n(predictions, n=3):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the n highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]
    return top_n

import random


# Function to get diverse recommendations
def get_diverse_recommendations(predictions, n=3, num_intervals=3):
    # Group predictions by user
    user_predictions = defaultdict(list)
    for uid, iid, _, est, _ in predictions:
        user_predictions[uid].append((iid, est))

    top_n = defaultdict(list)
    for uid, user_items in user_predictions.items():
        # Divide the range of predictions into intervals
        sorted_items = sorted(
            user_items, key=lambda x: x[1]
        )  # Sort items by estimated score
        interval_size = len(sorted_items) // num_intervals

        for i in range(num_intervals):
            if len(top_n[uid]) < n:
                # Select items from each interval
                start = i * interval_size
                end = (
                    start + interval_size
                    if i != num_intervals - 1
                    else len(sorted_items)
                )
                # Pick random item from this interval if available
                if sorted_items[start:end]:
                    selected_item = random.choice(sorted_items[start:end])
                    top_n[uid].append(selected_item)
                    # Remove the selected item to prevent re-selection
                    sorted_items.remove(selected_item)

    # Ensure each user has exactly 'n' recommendations, fill from any remaining items if needed
    for uid in user_predictions:
        remaining_needed = n - len(top_n[uid])
        if remaining_needed > 0:
            additional_items = sorted_items[:remaining_needed]
            top_n[uid].extend(additional_items)
            sorted_items = sorted_items[remaining_needed:]

    return top_n

In the end, we built a web-based app using Streamlit to allow educators to upload relevant CSV files and download recommendations of quizzes for each student in CSV format.

Visualize Clustering of Utility Matrix

We further segmented students into clusters based on their grades. We chose 3 clusters and fitted a spectral clustering algorithm. To visualize these clusters, we employed t-SNEs, and computed the Silhouette score, which came out to 0.2. This low score suggested that relying on students’ grades alone might not provide enough signal for clustering. This finding implies we could further improve the clustering and recommendation with additional features that may not be currently available to us in the Hackathon game but could be collected in the STACK system.

Conclusion

In this blog post, we showed our journey in advancing mathematics education through AI and user research. We developed a user-friendly GUI, showcased in an interactive Figma demo. We also created a personalized recommendation engine, allowing educators to generate quizzes tailored for students’ learning plan. These prototypes reduced educators’ workload, allowing them to focus more on teaching, making education more adaptive and effective. And we are delighted to contribute to making the world a better place!