Why Everyone Is Wrong About the NBA MVP Race

Robert Mepham
9 min readDec 12, 2022

--

Giannis Antetokounmpo drops his shoulder to start a driving cut to the basket.
(AP Photo/Marta Lavandier) (Marta Lavandier | AP)

The NBA MVP (Most Valuable Player) is a highly prestigious award given to the player who is deemed to have had the most impact on his team’s success during the regular season. Because basketball is a team sport, and the regular season is long and unpredictable, it is difficult to accurately predict who will win the MVP award. Additionally, the criteria for determining the MVP can vary from year to year and from voter to voter, making it even harder to predict with certainty who will win the award. Ultimately, the MVP is determined by a panel of sportswriters and broadcasters who vote for the player they believe is most deserving of the award, so predicting the winner is a subjective process that is influenced by many factors.

The debate over who “should” win the MVP often cites cherry-picked statistics that seem to favor their pre-determined favorite, and rarely consist of actual unbiased analysis. I often bore of these conversations as it feels like these media personalities often champion logically flawed arguments thought up by Twitter trolls on national TV.

Machine learning has had a significant impact on sports in recent years, and its use is becoming increasingly widespread. Machine learning algorithms can be used to analyze vast amounts of data and make predictions or recommendations based on that data. In sports, machine learning can be used for a variety of purposes, such as analyzing game footage to identify and track players and the ball, predicting the outcome of games or individual plays, and making recommendations for strategic decisions. Additionally, machine learning can be used to improve player performance by identifying strengths and weaknesses and suggesting training regimens or adjustments to playing style.

A augmented-reality visualization of an NBA contest between the Oklahoma City Thunder and the Log Angeles Clippers with each offensive player augmented with their relative ShotQuality score value as well as passing lanes and defensive zones highlighted.
Ken Kerschbaumer, Editorial Director — Sports Video Group

This impact of machine learning extends to the prediction of who will win end-of-season awards in the NBA. Many people’s failure to understand what criteria actually drive the voting for these awards made me think: “Could I come up with a way that would give the most unbiased and statistically valid prediction as to who will win various end-of-season NBA awards?”

As a Data Scientist, I jumped right into gathering any sort of publicly available data I could on the topic to see if there was a pattern at hand. I settled on the following list of individual NBA awards to predict:

  • Most Valuable Player (MVP)
  • Defensive Player of the Year (DPOY)
  • Rookie of the Year (ROTY)
  • 6th Man of the Year (6MOY)
  • Most Improved Player (MIP)

As well as the following list of NBA honors that are awarded to multiple players each season:

  • All Star (ASG)
  • All-NBA (ANBA)
  • All-Defense (ADEF)

I’d like to preface this summary of my research by stating two key things. Firstly, as any Data Scientist knows, no model will ever be perfect. I’m not claiming that the probability estimates that will come from these models should be taken to Vegas. Secondly, any astute observer will see, often, the probabilities for individual awards like MVP (of which there can only be 1 per season) don’t add up to 1, meaning they’re not actual probabilities that could be converted into useful betting info, they are merely an algorithmic estimation of the likelihood of each player being “deserving” of said awards given their statistical output and historical voting patterns.

Without further a-do, here’s a quick rundown of the methodology employed behind these rankings, and then a snapshot of where these award races stand for the ‘22-’23 season as of early-December.

Data were compiled from a series of public sources such as NBA.com, ESPN.com, BasketballReference.com, and Wikipedia.com into a series of cleaned datasets. All players that qualified for consideration of each award (i.e., played over 50 games in a given season, and met other ancillary criteria like being a rookie for ROTY consideration, or being a bench player for 6MOY consideration) from the 2010–11 season through the 2021–22 season were arranged into randomly split training and testing sets (85/15 split).

Shai Gilgeous-Alexander brings the ball up the floor for the Oklahoma City Thunder.
Brian Fluharty-USA TODAY Sports

Due to the sparseness of award winners (12 MVP winners from a dataset of over 5000 player-seasons), the models were trained using heavily biased class-weights to ensure they picked up on the right trends and optimized for precision & recall instead of pure accuracy.

Due to time and hardware constraints (and admittedly a bit of laziness on my part) I only used Logistic Regression models to evaluate the problem at hand, and did not dive deeper into Bagging/Boosting models, or even further into Recursive Neural Networks. I think you’ll see that the precision, accuracy, and AUC returned from this simplistic approach is more than considerable in terms of presenting an adequate solution to the original task of predicting NBA award winners.

Instead of evaluating the models in a traditional logistic regression sense, where we expect to see the probabilities of members of the positive class to be >0.500 and those of the negative class <0.500, we will instead determine if the model made the right “vote” for a given award in each year. This is because the model will often have 5–15 players in a given season with predicted MVP-probabilities >0.500. This phenomenon is a direct result of the class_weights used in the training process. However, we can still determine which player the model says is “most likely” to be awarded the MVP (and other subsequent awards) by looking at the player with the highest predicted probability in each year.

5 Tables Detailing The Results of The MVP, DPOY, ROTY, 6MOY, and MIP models year-by-year (2010–2022).

The MVP model has picked 11 of the last 12 MVP’s (91.66%), with only Steph Curry’s ’15 MVP being unforseen according to the model. The Defensive Player of the Model has predicted 10 of the past 12 winners (83.33%), with only Tyson Chandler in ’12 and Marcus Smart in ’22 being ‘surprise’ winners according to the model. The 6th Man of the Year model has only missed once in the past 12 seasons (91.66%), giving the ’11 6MOY award to Jamal Crawford instead of Lamar Odom. Even more impressive is the Rookie of the Year model which has not missed in its last 12 predictions (100.00%).

The Most Improved Player model has only selected 8 of the previous 12 MIP winners. This is likely so much less accurate than the other models because the dataset on which it was trained is fundamentally different as it contains the per-game, per-36, and per-100 changes in a player’s performance from their previous season. Even as the least accurate of this set of individual award algorithms, it still predicts winners at a 66.0% rate.

When the models do miss, they still predict the actual winner within the top-3 of the respective award ladder, meaning the model isn’t too far off in its assessment. This is already a much better alternative than the biased takes presented ad-nauseam both on sports entertainment shows and on NBA Twitter.

3 Tables Detailing The Results of The All-Star, All-NBA, and All-DEF models year-by-year (2010–2022).
The All-Star model has predicted 283 of 288 ASG selections (non-injury replacements) over the past 12 years. The All-NBA model has predicted 176 of 180 All-NBA team honorees, and the All-Defensive model has predicted 116 of 120 all-defensive team honorees (2010–2022).

The group award models have predicted 98.26% of the All-Star selections, 97.77% of the All-NBA honorees, and 96.66% of the All-Defensive honorees. This outcome is incredible considering both the fact that a very simple ML approach is being used, and the fact that the precision tops out at well over 95% for all group awards.

That brings us to this year’s award races. The nature of the dataflow for these models allows updated predictions to be generated every day after the conclusion of NBA games for the night. Therefore, players’ chances at winning each award change over time as their statistical profile evolves throughout the season. The models themselves were trained on end-of-season averages, so as players play more games, their current averages regress towards their end-of-season average, meaning their probability of winning a given award also regressed towards their end-of-season probability.

A line graph showing the top-5 rookie of the year candidates and their associated model probabilities over the 5 day window from December 7th to December 12th 2022. Jaden Ivey currently leads the models predictions, but the gap between him and 2nd place (Paolo Banchero) has almost halved over the 5 day window featured.

For example, here is how this seasons’ Rookie of the Year race has changed over the past 5-days as this article has been sitting in my drafts (writing about Data Science is always less exciting than doing the actual legwork). As you can see, the original gap between #1 (Jaden Ivey) and #2 (Paolo Banchero) has narrowed as more games have been played.

These continually updated odds from the live models can show how a players’ chances to win an award or be selected for an end-of-season honor change over certain windows of time.

As of 12/12/2022, this is how the models sees the race for each award:

  • MVP — Giannis Antetokounmpo
  • DPOY — Anthony Davis
  • 6MOY — Bennedict Mathurin
  • ROTY — Jaden Ivey
  • MIP — Nikola Jokic (yes… this model is the worst one)

Remember, a probability >.500 does not mean a player is predicted to win the award (individual awards), only the top player in each individual award ladder is predicted to win the award at any given moment.

For the group awards (ASG, ANBA, and ADEF) more than just the top player in each ladder are predicted to be selected. For the ASG model, the top 12 players in each conference are projected to be All-Stars. For the ANBA model, the top 6 guards, 6 forwards, and 3 centers are predicted to be All-NBA team selections. And finally for the ADEF model, the top 4 guards, 4 forwards, and 2 centers are projected to be All-Defensive team selections.

Again, this model doesn’t purport to be 100% accurate, nor does it claim to give discernable advice when it comes to finding “value” in the probabilities (and by derivation the odds) of a player winning a given award. However, it does provide predictions that have proven to be over 90% accurate (except for MIP & DPOY) during the previous 12-year window.

Nikola Jokic (15) drives past a Toronto Raptors defender
Tom Szczerbowski/USA TODAY Sports

Furthermore, if your favorite player ranks lower in the eyes of the model, it doesn’t mean that they won’t win a specific award, it just means that if they did win, it would be very dissimilar to the ways in which the voters have chosen the award in previous years. A perfect example of this would be Marcus Smart’s DPOY in ‘21-’22. He ranked 3rd in our DPOY ladder, mostly because as a guard, his stat line was so dissimilar to the previous 12-award winners (0 of which were guards).

Also, the trend analysis portion of these models is key to understanding which NBA players are rising or falling in the rankings. If your favorite player is lower than you’d expect in a certain award ranking because they’ve had a killer stretch of 5 or 10-games, check out how their odds have changed over time, because what you will likely see is that they have steadily risen in the rankings, and they likely haven’t peaked as the model is a bit conservative as it adjusts for hot/cold streaks.

In summary, it is possible to algorithmically determine who is most likely to win various individual NBA awards at over 83-% accuracy (save for the MIP), and group NBA-honors at over 95-% accuracy on average with a fairly simplistic machine learning approach. While the models aren’t perfect, one thing is certain, they sure are better than listening to mindless debate on Twitter or on TV about who is most “valuable” to their team with stats cherrypicked harder than LaMelo’s 90-pt game in High School (Does the fact that that game was almost 6 whole years ago make anyone else feel old?).

--

--

Robert Mepham

Data Scientist | Amateur Sports Analyst |Leveraging AI In The World of Sports