The Research Lab Presents: NBA Prediction App v1.0 Beta

The Research Lab
5 min readOct 24, 2023

--

nba champion newspaper Headline in the style of manga — Midjourney

Intro

Welcome back, and for the newcomers welcome to The Research Lab. Last February we released an article with some code discussing how to build an NBA Regular Season Home Team Win Loss model. You can find that here.

In a continuation of that work, we are happy to share our beta NBA prediction Streamlit application. With the regular season starting tomorrow, this app allows us to share the model’s performance throughout the coming season. In addition to sharing what we’ve created, we’ll also review the architecture along with some insights as to what the future holds.

For those who may be unfamiliar, this application and its predictions are being developed and shared for educational purposes only. We are not responsible for any bad ideas that you might have 😂.

What we created

What It Is

This beta Streamlit application is straightforward in its current capabilities delivering a view of historic model performance and predictions for the next 15 upcoming games.

Due to the regular season not having yet started, all we currently see is the previous year's performance. More specifically this shows the accuracy and precision of the win/loss model that was trained on 2020–21 and 2021–22 regular season data while reporting on the resulting prediction outcomes for 2022–23 season. Interestingly, the accuracy reflects the long-term baseline for the home-team advantage. The precision line represents the total number of times the model correctly predicted a home team win when predicting that the home team would win.

The upcoming games table is also straightforward, nothing fancy. Capturing upcoming dates, matchups, and current model predictions. One caveat to mention is that the model is trained on features based on rolling measures. Therefore until the necessary amount of games have been played during the season, previous season games are used to calculate model features. This is done so that we can deliver predictions for the beginning of the season, but this naïve approach is unable to account for changes in the off-season which impacts predicted outcomes.

How We Did It

The application code and other supporting scripts are located here. As discussed in the previous article, the NBA_API repository is used as the data source. It provides a mixture of live and historic NBA data at various granularities. Play-by-play, Game Stats, Season, Career, etc. We are pulling game stats, team, and player data from their available endpoints into a PlanetScale MySQL database.

After the initial loading of stats, we then proceed to feature engineering with Pandas/Numpy and Scikit-Learn for model training and inference. To deliver features to the application for model inference we created an API endpoint using FastAPI and are hosting our endpoints using OnRender. In case you’re wondering, at this current moment we are not making the API endpoints available, and depending on end-user interest we are open to setting up a feature store API for use.

The app uses Streamlit for the current front-end presentation with the plan to move to a JavaScript-based framework in the future. Lastly, to automate the data refreshes we employ the use of Airplane.Dev, which is a cloud-based scheduling tool.

Thoughts and What’s Coming?

The early phases of this project are still in motion and our backlog includes more work to close the loop on model performance evaluation, model enhancements, data processing enhancements, and upgrades to the front-end.

  • Javascript-based framework front-end: The goal was to deliver the application using one of the Javascript frameworks however as the story of development goes, our timeline has been pushed back. Here are some current screenshots, and these views are subject to change.
This page displays fake data for front-end design purposes.
This page displays fake data for front-end design purposes.
  • Live Model Performance: Visualizing the current season with accuracy and precision.
  • Record Versioning: Check previous game stats for retrospective adjustments and update the database/model.
  • Error Clustering: This was also mentioned in the previous article, and is still fairly high on the priority list. The idea is to group the games the model got wrong into like clusters for further analysis.
  • Live Update Model: Include a model that is updated throughout the season to compare against the model trained only on the previous 2 seasons.
  • Model API Endpoint: Currently the Streamlit app hosts the model however this will slow the app down as new models are added, so being able to pass model features to a model endpoint will be imperative to maintain performance.
  • Application Testing Scripts: With the application’s small codebase maintenance was manageable without testing, however, this will not be viable going into the future and therefore requires testing scripts to be created.
  • More Models, Features, and Analytics: Ultimately the current plan is to continue to build on what we’ve created further developing the application, and capturing different levels of analytics with regard to general stats and model predictions.

Conclusion

Thank you for taking the time to review this article and our NBA prediction application. We hope you enjoy tracking the model throughout the season to observe the performance. As always we welcome feedback your https://www.linkedin.com/company/the-research-lab and we wish you a happy NBA Season.

--

--

The Research Lab

From Lab to Market: Bridging the Gap Between AI Research and Real-World Innovation