How to win the NBA MVP

Jd Donohue
5 min readMay 2, 2018

--

By Michael Horenstein, Zach Esber, and JD Donohue

The NBA MVP is an extremely contentious topic among NBA fans and pundits. What should determine it? Wins above replacement? Points per game? Success in the playoffs? There is no clear set of rules and thus the MVP is extremely hard to predict. This article will tackle our attempt to use regression and artificial intelligence to get a better understanding of which metrics historically determine who will win NBA MVP. Further, we will apply our model to this season’s stats and attempt to predict who the 2018 NBA MVP will be.

Methodology

How did we go about this? We built several AI regression models with different combinations of statistics to see which combination is the most predictive of MVP. These combinations included basic stats (PPG, RPG, APG, SPG, BPG, FG%, All-Star Binary, Playoff Binary), advanced stats (PER, OWS, DWS, WS/48, VORP, OBPM, DBPM, EFGPCT, All-Star Binary, TopWS), conspiracy stats (USGPCT, MP, Market Size), plus every possible combination of these stats. The exact stat we tried to predict was the percentage of total MVP votes received. After running each regression, we created CSV files to compare actual MVP vote share received with predicted MVP vote share for each regression.

In order to compile this data, we took data from several online NBA sources and combined them on excel using excel tools. These were then uploaded to Python to create the models.

After performing preliminary tests using the AI regression models, we calculated the least squares regression equations for each statistic and determined the R² values. After calculating the values, we recorded the statistics with the highest and lowest R² values. We then used the variables with the highest R² values to create our final model.

Our Model

We used a decision tree regressor to predict what percentage of possible MVP votes each player would get. First we extracted the key stats for the players based on the model that we wanted to use (i.e. Traditional, Conspiracy, etc). Then we put those stats into the model and trained on a randomly chosen 80% of the dataset. We then tested on the final 20% and ran regressions on predicted versus actual MVP percentage for every model to get how accurate each one was.

Which Stats were Most Predictive?

In order to determine this, we used linear regression to find the R² values to determine the correlation between different stats and MVP vote share.

This data is so surprising because traditional stats like Points per Game, and Assists per Game, although still have a correlation about .6, were beaten out by newer statistics. This shows that the NBA has done a good job in recent years of creating statistics that accurately represent the success of its players. It was also surprising to see how closely the traditional stats performed to the conspiracy stats that we thought would have little a very small correlation if any at all.

These are the stats we will use to create our final prediction model for 2018 MVP. There are some predictable as well as surprising statistics in the top ten. Value Over Replacement makes sense as the number 1 stat because it measures how valuable a player is relative to another player of minimum salary and the same position. Further Offensive Win Share refers to the amount of wins a teams offense has provided for the team. That makes sense because value can be reflected in how successful a team is (ie: wins). We were surprised to see that free throws made is more important than points scored and field goals, as they get far less coverage in the media.

Which Stats were the Least Predictive?

We were correct in predicting one of our conspiracy stats in Market Size. Many of these statistics are largely out of the players hands, so it makes sense that they are less correlated with who becomes the MVP. When these stats are entered into our model, the correlation to MVP vote share is 19.1, proving that they are not predictive at all.

Who will win the 2018 MVP Award?

To predict who will win the 2018 NBA MVP we chose to use our collection of advanced stats over the top 10 stats determined by linear regression because when fit to the model, the advanced stats had a higher correlation.

********Stat Acronyms********

  • FG: Field Goals
  • PPG: Points per Game
  • RPG: Rebounds per Game
  • APG: Assists per Game
  • SPG: Steals per Game
  • BPG: Blocks per Game
  • FG%: Field Goal Percentage
  • All-Star Binary: Did player make all-star game?
  • Playoff Binary: Did the player make the playoffs?
  • PER: Player Efficiency Rating
  • OWS: Offensive Win Share
  • DWS: Defensive Win Share
  • WS/48: Win Share per 48 minutes
  • VORP: Value over Replacement
  • OBPM: Offensive Box Plus Minus
  • DBPM: Defensive Box Plus Minus
  • EFGPCT: Effective Field Goal Percentage
  • TopWS: Highest Win Share on Team
  • USGPCT: Usage Percentage
  • MP: Minutes Played
  • WS: Win Share
  • FT: Free Throws made
  • FTA: Free Throw Attempts
  • FTPCT: Free Throw Percentage
  • FTr: Free Throw Rating
  • STLPCT: Steal Percentage
  • BLKPCT: Block Percentage
  • TOVPCT: Turnover Percentage
  • 3PPCT: Three Point Percentage
  • 3PAr: Three Point Attempt Rating
  • ORBPct: Offensive Rebound Percentage

https://zesber.shinyapps.io/mvp_regression/

^^^Interactive Regression Tool^^^

CODE^^^

--

--