Predicting the 2022–2023 NBA MVP Using Machine Learning

Atharv Joshi
8 min readDec 27, 2022

--

Background

After every hard-fought 82-game NBA regular season, the elusive Most Valuable Player award is crowned to the player that demonstrates the highest level of personal achievement and team success (although specific criteria will never be released). As of 1980, a media panel is selected each year from the United States and Canada to place ballots for the award. First-place votes are worth 10 points, second-place votes are worth seven, third-place votes are worth five, fourth-place votes are worth three, and fifth-place votes are worth one. The player with the highest win share (points amassed / points possible) receives the award. For example, Nikola Jokić won the MVP award last year with a win share of 0.875, easily beating out Joel Embiid (0.706) and Giannis Antetokounmpo (0.595).

Nikola Jokić holding last year’s MVP trophy (source: NBA.com)

Considering the level of bias and uncertainty that is present within the decision system, most NBA fans have come to believe that the MVP award is impossibly difficult to predict this early in the season. After all, narratives can change and teams can self-destruct. However, given current season averages and past data, I wanted to see if a regression-based machine learning model in Python could crack the code (assuming no injuries and sustained performance).

Data

The statistics used in this model were a combination of basic counting stats (simple frequency-based numbers) and more advanced metrics (formula calculations).

Counting stats: games played, minutes played per game, points per game, assists per game, rebounds per game, steals per game, blocks per game, percent of shots made, percent of three-pointers made, percent of free throws made, shots attempted per game, three-pointers attempted per game, free throws attempted per game, team win percentage

Advanced stats —

Player Efficiency Rating: as the name suggests, a rating of the player’s performance

True shooting percentage: efficiency metric that takes three-pointers and free-throws into account

Usage percentage: frequency that a player was used in plays while on the court

Box Plus/Minus: points per 100 possessions above an average player, adjusted for an average team

Win shares: estimated number of wins contributed by given player

Win shares per 48 minutes: win shares per full-length game (48 minutes)

Further explanation of these metrics can be found at the holy grail of all basketball statistics: Basketball Reference.

Fortunately, the Kaggle user @danchyy created a dataset with these statistics for every MVP candidate from 1980–1981 to 2017–2018, as well as the MVP shares that the player received.

Process

First, I wanted to see the approximate relationship between award share and specific statistics that are commonly believed to influence the voting. In the following charts, yellow dots signify that the player won MVP in the given year:

Right off the bat, it seems that win shares, points per game, and win percentage all have some influence in award share, but this doesn’t help us predict exact numbers.

The numerical — rather than categorical — nature of this process indicates that a regression model should be used. By plotting the training data and attempting to graph trendlines, a regression model should be able to estimate a given player’s award shares.

I initialized a multiple linear regression model with the following code:

import sklearn.linear_model as lm
regr = lm.LinearRegression()

However, before jumping into predictions for this year’s MVP, I wanted to see the model’s statistical take on past MVPs. How would it judge controversial seasons, like 2011 Derrick Rose or 2006 Steve Nash? How many MVPs would all-time greats have if this model was the decision-maker?

Steve Nash being awarded the 2006 MVP award (source: AZCentral)

Analysis

Using the original 20-metric list, the model identified the actual MVP in just 21 of the 38 tested seasons (55.3%). However, some of the metrics hampered the model’s success — poor regression coefficients indicated minimal correlation, which should not be the case. After significantly trimming the metric list, the model predicted 26 out of the 38 seasons correctly (68.4%). Surprisingly, stats like PER, true shooting percentage, usage percentage, and even points per game had to be removed. Here are the final seven metrics used in the model:

Assists per game

Box Plus/Minus

Field goals attempted per game

Total rebounds per game

Team win percentage

Win shares

Win shares per 48 minutes

After all revisions, the model predicted a top three MVP candidate to win 97% of the time. The only exception? 2005 Kevin Garnett, who finished 11th in MVP voting. “The Big Ticket” is widely believed to be one of the most underrated players of all time, and this seems to support that conclusion. In the 2004–2005 season, KG put up 22.2 points, 13.5 rebounds, 5.7 assists, 1.5 steals, and 1.4 blocks per game — a monster statline in any era — along with the highest win share and tied highest win share per 48 minutes. Ultimately, Garnett’s downfall was his team record (44–38), so he had a major disadvantage against the winner Steve Nash (62–20) and runner-up Shaquille O’Neal (59–23).

Kevin Garnett on the Timberwolves (source: Nathaniel S. Butler/Getty Images)

A major part of every MVP selection is the “narrative.” If a candidate has a positive story attached to their name, they might be more likely to win the award. For example, prior to the 2010–2011 season, no one thought that Derrick Rose’s Chicago Bulls would make any noise in the regular season or playoffs — the story was all about the Miami Heat and Boston Celtics. However, Rose’s humility and leadership shone as the Bulls won 60 games that year, securing the top seed in the East. What enhanced the narrative was that Rose was a Chicago native when he was drafted first overall to the Bulls, making him an instant fan favorite. Also, unlike other superteams (cough, cough), Rose’s Bulls were almost completely “built, not bought” — their core players were all drafted to the team, so they were a stark contrast to the Miami “Heatles.”

Derrick Rose dribbling the ball, tightly guarded by LeBron James (source: NBC Sports)

Derrick Rose’s spectacular play and inspiring narrative helped him win the MVP award in 2011. However, my regression model was unable to capture the story behind each player, so it gave the award to LeBron.

Another example of narratives playing a role can be found in the 2007–2008 season. The late, great Kobe Bryant came into that year averaging all-time numbers, including a monster 35.4 PPG 2005–2006 season. However, despite Bryant’s historic play from 2005–2007, the playmaker Steve Nash won MVP in both years. Practically everyone in the basketball community believed that Kobe was due for the award. However, that season, an undersized guard by the name of Chris Paul took the league by storm, leading the league in assist percentage, steal percentage, offensive win shares, win shares, and win shares per 48. Many NBA fans believe that Paul bested Bryant in this season, but fell short of the MVP award because of Kobe’s narrative. Once again, since narratives cannot be captured in a regression model, Paul was awarded the MVP in this simulation, even though Kobe received the award in real life.

Chris Paul and Kobe Bryant (source: The New York Times)

I would argue that a significant portion of the model’s incorrect predictions could have been avoided if there was a way to quantify the powerful narrative.

After manually inputting MVP candidates’ stats from 2019–2022, the model went 3/4 — it wrongly predicted James Harden in 2019 (Giannis Antetokounmpo was the actual winner in this controversial race) but correctly predicted Giannis in 2020 and Nikola Jokić in 2021 and 2022.

So how many MVPs would all-time greats have if this model was the decision-maker? Well:

This makes for some incredible “what if?” story material.

Results

The latest edition of the Kia MVP ladder (published by NBA.com on December 9), ranks as follows:

1. Jayson Tatum (Boston Celtics)

2. Giannis Antetokounmpo (Milwaukee Bucks)

3. Luka Dončić (Dallas Mavericks)

4. Nikola Jokić (Denver Nuggets)

5. Ja Morant (Memphis Grizzlies)

6. Devin Booker (Phoenix Suns)

T-7. Kevin Durant (Brooklyn Nets)

T-7. Donovan Mitchell (Cleveland Cavaliers)

9. Anthony Davis (Los Angeles Lakers)

10. Zion Williamson (New Orleans Pelicans)

Remaining are listed in alphabetical order:

Jaylen Brown (Boston Celtics), Stephen Curry (Golden State Warriors), Joel Embiid (Philadelphia 76ers), De’Aaron Fox (Sacramento Kings), Shai Gilgeous-Alexander (Oklahoma City Thunder)

Using this player pool, the model outputted the following top 5 MVP ranking (no narratives, just statistics):

1. Nikola Jokić (predicted win share: 0.495)

2. Luka Dončić (0.434)

3. Jayson Tatum (0.331)

4. Joel Embiid (0.320)

5. Giannis Antetokounmpo (0.314)

This is as close as an MVP race gets.

Back to back…to back? (source: NBC)

Thoughts

While Jokić has delivered a legendary statline so far in the year, it’s still unlikely that he ends up winning the award, due to a phenomenon known as “voter fatigue.” Since NBA media and fans are the decision-makers, they often get tired of the same player winning the award over and over again. As alluded to earlier, Jokić is the defending back-to-back MVP, which puts him at a major disadvantage in a race as close as this one — the talent pool in today’s NBA has never been better.

The next man up is Luka Dončić, who is similarly averaging out-of-this-world numbers. Unfortunately for Luka, a lack of team success greatly hampers his MVP case — unless his supporting cast allows him to compete with the best teams in the league, I don’t see him winning the award, either. In the past, players have won the award with average win percentages, but this was usually due to their record-breaking numbers (see 2017 Russell Westbrook) or relative lack of competition. Ultimately, the level of competition for the award this year means that the winner must be virtually perfect on all accounts.

Which brings us to Jayson Tatum. As of right now, Tatum boasts the best team record in the NBA, a top-3 model score, no past MVP awards (rendering him immune from voter fatigue), and a favorable narrative (beloved by one of the largest markets in the league). Assuming no injuries or significant changes — which is never a given — Jayson Tatum is my pick for the 2022–2023 MVP (although I would be a fool to count Giannis out).

Source: SportingNews

--

--

Atharv Joshi

High school student 📚 | Economics major 💵 | Data science enthusiast 💻