NBA Game Prediction System

4 min readJun 29, 2023

Objective

The motive of this article is to explore machine learning techniques to predict the winners of NBA basketball games using historical data.

Currently, it does not involve predicting future game outcomes. The goal is to create a model that can accurately predict the outcome of a game using past box score data. The project will use supervised learning, with the model being trained on labeled data. Feature selection techniques will be employed to discover good predictors, and predictions will be made using a machine-learning algorithm.

Motivation

With a longstanding passion for sports and having held the role of a captain on my high school basketball team, I transitioned into officiating basketball intramurals at Rutgers University as part of recreational activities. Recognizing the potential synergy between my affinity for the sport and my interest in deriving insights from data, it appeared to be a compelling endeavor.

Data Collection

I utilized web scraping methodology to extract the data from the website https://www.basketball-reference.com/. The dataset obtained consists of 17,772 rows and 75 columns.

Establishing Hypothesis

Metrics and Planning

The primary performance metric for this task is expected to be accuracy, a widely utilized metric in classification tasks. The objective is to optimize the model’s accuracy specifically on the test set. To evaluate the model’s performance, we will compare the accuracy of different regression models such as ridge, lasso, KNN, and Perceptron, in alignment with the training plan. Additionally, incorporating rolling averages may be explored as a means to enhance the model’s accuracy.

Establishing Baseline

It is highly likely that a Home team has a greater probability of winning compared to an Away team. I will consider this as a reference point and examine the winning percentages of teams when they play at home versus when they play away. This analysis will establish the benchmark for determining a satisfactory level of accuracy.

Directed Acyclic Graph

Exploratory Data Analysis

Understanding the correlation between features

Comparing Model Prediction Accuracy

Benchmarking Baseline Accuracy

Improving Accuracy over Baseline

Rolling averages are important in improving accuracy as they help smooth out noisy data, identify long-term trends, handle seasonal or cyclical patterns, and enhance forecasting or prediction. By reducing short-term fluctuations, rolling averages provide a more stable estimate or prediction, enabling a clearer understanding of underlying trends and patterns. This technique is valuable for sports analytics and other fields where accuracy is crucial, as it minimizes random variations and enhances the overall accuracy of data analysis and predictions. I will employ it in my ridge classifier model to try and increase my prediction accuracy over the baseline score.

Conclusion

The alternate hypothesis holds true and I was able to increase the accuracy of my ridge classifier model from 54.01% to 63.11%. This is a 16.84% improvement in overall accuracy.

Also, my improved model has surpassed the baseline accuracy score of 57.16%

Application

The proposed model can help basketball teams make better game strategy decisions, which is one way it can be used in practice to improve decision-making. Coaches and team managers, for example, can use the model’s predictions to identify areas of weakness in their team’s performance and modify their strategy accordingly. In future iterations, the model can also help predict future game outcomes which can be useful in the context of sports betting or prediction markets, where individuals or organizations are interested in making informed decisions about the outcome of NBA games based on historical data.