Introduction to Our Project

1. Information Background

The National Basketball Association (NBA) ranks third professional sports league in the world by revenue. At present, there are 30 teams in the association. These 30 teams can be divided into 6 divisions: Atlantic, Central, Southeast, Northwest, Pacific and Southwest. Each division consists of 5 teams. From October to next April, NBA regular season, playoffs and finals attract basketball fans from all over the world. Here are several figures showing people’s interests on NBA:

In this project, we mainly consider about NBA games in each regular season. During each regular season, each team plays 82 games, 41 each home and away. As is stated in [1], each team faces opponents in its own division four times a year. Each team plays six of the teams from the other two divisions in its conference four times, and the remaining four teams three times. Finally, each team plays all the teams in the other conference twice apiece.

2. Research Background

Articles about NBA games are mainly about: Economic Value and Policy[2], Injury Risk Analysis[3], Outcomes predictions, travelling Costs and Player Fatigue[4], etc. Our project focus on predicting point difference of each game. Several similar researches can be found: [5] has three attempts to model point spreads in college basketball: Heuristics, Bi-Directional Stepwise, Least Angle Regression. [6] implements multiple machine learning techniques to predict the number of points scored by each team in an attempt to beat the spread, for example, Linear Regression, Support Vector Machine, Boosting, also a modified model (Polynomial Regression + SVM + Naive Bayes). [7] focus on predicting win or lose of a game via maximum entropy based model, and the prediction accuracy of this model is up to 84.8% while the number of games it can predict is limited. [8] utilises ML.NET to make the score prediction. [9] compares the performance of Linear Regression, Logistic Regression, SVM with RBF kernel, SVM with linear kernel, Decision Tree, Random Forest, Extra Trees, Gradient Boosting. [10] considers about Neural Network Regression. The result shows that NNR outperforms Linear Regression and SVM.

3. Plans for further investigation

4. Definitions[11]

We provide the definition of statistics that we use for our project. For the first table, it is about some basic statistics, and the table includes their abbreviations and full names. For the second table, it is about some advanced statistics that we use in our project, and the table includes their abbreviations, full names and a brief definition.

5. Outline of our project

5.1 Data Acquisition

5.2 Data Exploration and Understanding

5.3 Data Munging and Wrangling (eg. Calculating Elo Score)

5.4 Data Visualisation

5.5 Statistical Analysis and Modelling

5.6 Model Evaluation and Selection

5.7 Ensemble Learning


  2. Jerry A. Hausman and Gregory K. Leonard, “Superstars in the National Basketball Association: Economic Value and Policy,” Journal of Labor Economics 15, no. 4 (October 1997): 586–624.