NBA Player Value Models: XGB-PM, Predicting RAPM with Box Stats and More

John Chen
6 min readJun 2, 2023

--

Jokic over Embiid? Caruso? I’ll be diving into the hows, whys, pros and cons in this post.

In NBA Metrics: Calculating RAPM I covered RAPM as the all-in-one metric for NBA player evaluation and how its calculated. Now, let’s re-create the popular public metrics by regressing Offensive RAPM (ORAPM) and Defensive RAPM (DRAPM) against box score stats, where overall RAPM = ORAPM + DRAPM. The vast majority of popular public metrics (DARKO, LEBRON, EPM, PIPM, BPM, RAPTOR etc.) use this same methodology where the main differences lie in the processing of stints to calculate RAPM (such as using luck adjusted or not) and in the player-stats used as input when predicting RAPM (such as whether to use season-end stats or rolling exponential weighted stats, or to include data such as distance to shooter on contest).

The Data

I used all the box score stats (including traditional, advanced, scoring, misc, hustle etc.) on NBA.com and filtered some out on simpler methods which have a stronger tendency to overfit. For more modern methods its a little bit less of a concern, particularly if we do incremental RAPM and exponential moving average of player stats instead of season-end RAPM and season-end player stats. For now, we’ll just use season-end for convenience but that significantly reduces the training size (~450 players a season get more than near-zero minutes) and is probably worse.

More specifically, the stats I used as bucketed by nba.com are:

Traditional: PLAYER_AGE, GP, GS, MINS_PER_GAME, FG2_PCT,FG3_PCT, FT_PCT

Traditional per possession: FG2M, FG2A, FG3M, FG3A, FTM, FTA, OREB, DREB, REB, AST, STL, BLK, TOV, PF, PTS

Advanced: E_OFF_RATING, E_DEF_RATING, E_NET_RATING, AST_PCT, AST_TOV, AST_RATIO, OREB_PCT, DREB_PCT, REB_PCT, TM_TOV_PCT, EFG_PCT, TS_PCT, E_USG_PCT, PIE

Misc per possession: PTS_OFF_TOV, PTS_2ND_CHANCE, PTS_FB, PTS_PAINT, OPP_PTS_OFF_TOV, OPP_PTS_2ND_CHANCE, OPP_PTS_FB, OPP_PTS_PAINT, BLK, BLKA, PF, PFD

Scoring: PCT_PTS_2PT, PCT_PTS_2PT_MR, PCT_PTS_3PT, PCT_PTS_FB, PCT_PTS_FT, PCT_PTS_OFF_TOV, PCT_PTS_PAINT, PCT_AST_2PM, PCT_UAST_2PM, PCT_AST_3PM, PCT_UAST_3PM

Hustle stats per possession: CONTESTED_SHOTS, CONTESTED_SHOTS_2PT, CONTESTED_SHOTS_3PT, DEFLECTIONS, CHARGES_DRAWN, SCREEN_ASSISTS, SCREEN_AST_PTS, OFF_LOOSE_BALLS_RECOVERED, DEF_LOOSE_BALLS_RECOVERED, LOOSE_BALLS_RECOVERED, OFF_BOXOUTS, DEF_BOXOUTS, BOX_OUT_PLAYER_TEAM_REBS, BOX_OUT_PLAYER_REBS, BOX_OUTS

Some of these features only started being recorded in the middle of the 2015–2016 season, so we used data between the 2016–2017 season to 2022–2023 seasons.

The Models

Two typical simple models used in regression settings are: a) Linear regression for baseline, b) Boosted decision trees (e.g. xgboost, catboost, lightgbm). We could do some manual feature engineering such as feature cross or adding the square, and square root, of each feature as additional features, but I decided omit those for now.

In terms of training data: To avoid models memorizing players, the test set is instead 10% of player ids between 2016–2023 (credit to v-zero on apbr for discussion) and the rest as training instead of using 2016–2022 as train and 2022–2023 as test.

I filtered out players who played less than 15 games and 5 min/game to remove outliers for linear regression. This had limited effect on xgboost.

Linear regression

a) I standardized features and used linear regression with L2 regularization with alpha=1000.

Top 15 results for the 2022–2023 season:

name              | Pred ORAPM | Pred DRAPM | Overall pred RAPM
---------------------------------------------------------------------
Nikola Jokic | 4.61 | 1.99 | 6.56
Joel Embiid | 3.79 | 1.04 | 4.82
Boban Marjanovic | 2.13 | 2.42 | 4.55
James Harden | 4.18 | 0.36 | 4.55
Damian Lillard | 5.95 | -1.55 | 4.40
Jayson Tatum | 3.85 | 0.22 | 4.07
Jimmy Butler | 3.38 | 0.62 | 4.00
LeBron James | 3.06 | 0.83 | 3.89
Kevin Durant | 3.71 | 0.13 | 3.84
Jarrett Allen | 1.71 | 2.12 | 3.82
Anthony Davis | 1.86 | 1.92 | 3.78
Steven Adams | 1.14 | 2.61 | 3.75
Stephen Curry | 3.78 | -0.04 | 3.74
Luka Doncic | 3.91 | -0.21 | 3.70
Brook Lopez | 1.22 | 2.47 | 3.69

Results seem pretty reasonable and we see some common talking points replayed here, e.g. Lillard and Luka being strong offensively but not defensively. There is a surprise entry from Boban due to him being a low-medium minutes outlier and not being filtered out.

Top absolute feature coefficients for ORAPM:

Feature name    |  coefficient
--------------------------------
PCT_PTS_3PT | 1.37
FTM | 1.24
PTS | 1.20
OREB | 0.93
REB_PCT | 0.69

OREB_PCT | -0.71
E_USG_PCT | -0.78
FG2A | -0.78
DREB | -0.93
FTA | -1.13

Some of these coefficients seem a little counterintuitive. Surely OREB_PCT and FTA should have a positive relationship with offensive RAPM. This could potentially be explained by colinearity with other features.

Feature coefficients for DRAPM:

Feature name                |  coefficient
---------------------------------------------
BOX_OUTS | 0.67
FG3A | 0.53
PTS_PAINT | 0.52
REB_PCT | 0.35
BOX_OUT_PLAYER_TEAM_REBS | 0.32
GS | 0.32

PCT_PTS_PAINT | -0.39
PCT_PTS_2PT | -0.47
DEF_BOXOUTS | -0.85
PCT_PTS_3PT | -0.88

Again, there are some counterintuitive coefficients such as DEF_BOXOUTS being very negative.

MSE and MAE of 2022–2023 true ORAPM and DRAPM vs predicted ORAPM and DRAPM.

ORAPM MSE: 0.972 MAE: 0.77. DRAPM MSE: 1.00 MAE: 0.80.

Xgboost (XGB-PM)

b) Feed all features into Xgboost (termed XGB-Plus Minus for convenience).

Top 15 results for 2022–2023 season:

name                    | ORAPM_pred | DRAPM_pred | RAPM_pred (XGB-PM)
---------------------------------------------------------------------
Nikola Jokic | 4.59 | 2.19 | 6.78
Kawhi Leonard | 4.47 | 0.61 | 5.08
Joel Embiid | 2.80 | 2.13 | 4.93
Stephen Curry | 5.36 | -0.59 | 4.77
Alex Caruso | 1.13 | 3.62 | 4.76
Giannis Antetokounmpo | 3.42 | 1.24 | 4.66
Jayson Tatum | 4.39 | 0.14 | 4.52
Kevin Durant | 4.03 | 0.45 | 4.48
LeBron James | 3.92 | 0.50 | 4.42
Damian Lillard | 5.39 | -1.18 | 4.21
Paul George | 2.83 | 1.25 | 4.08
Jimmy Butler | 3.94 | 0.09 | 4.03
Jrue Holiday | 2.77 | 1.16 | 3.93
James Harden | 4.02 | -0.13 | 3.89
Anthony Davis | 1.86 | 1.98 | 3.84

Again, results seem pretty reasonable and based off of looking at the results its not immediately clear if this approach is superior. Kawhi and Curry are a little higher here, but so is Caruso.

Top 10 feature importance for ORAPM:

Feature name          |  relative feature importance
--------------------------------
E_OFF_RATING | 0.070
PIE | 0.041
LOOSE_BALLS_RECOVERED | 0.040
TM_TOV_PCT | 0.037
FT_PCT | 0.031
PTS_2ND_CHANCE | 0.028
PLAYER_AGE | 0.027
FG3M | 0.027
PCT_PCTS_PAINT | 0.025
MINS_PER_GAME | 0.025

It’s no surprise that offensive rating is the most important feature for predicting offensive RAPM. Other top features are generally possession creators. If you recover a loose ball or prevent a turnover or get an offensive rebound, these tend to have large point values since they are created out of nothing. For example, even assuming below average efficiency, recovering a loose ball generates 1 extra point, whereas a 40% 3pt shooter shooting the 3 versus a 35% 3pt shooter is only a 0.15 point difference.

Top 10 feature importance for DRAPM:

Feature name          |  relative feature importance
--------------------------------
GS | 0.058
E_DEF_RATING | 0.053
DEFLECTIONS | 0.053
GP | 0.044
MINS_PER_GAME | 0.041
E_NET_RATING | 0.034
PLAYER_AGE | 0.029
OPP_PTS_2ND_CHANCE | 0.028
CHARGES_DRAWN | 0.027
AST_RATIO | 0.025

Again, it’s no surprise that defensive rating is a top feature and other features like deflections and charges drawn which are associated with the traditional notion of good defense are also important. Games started, played and mins per game being top features indicates there is likely not enough signal in the input to accurately predict defense and the model relies on the coaches/teams informed choice of who to start. Games started being a useful feature is not necessarily a bad thing since there are many things that happen outside of recorded stats, however it being one of the top feature importance is problematic.

MSE and MAE of test set of true ORAPM and DRAPM vs predicted ORAPM and DRAPM.

ORAPM MSE: 1.07 MAE: 0.81. DRAPM MSE: 0.91 MAE: 0.76.

Interestingly, xgboost and linear regression perform similarly on MSE and MAE. Better performance could probably be attained with manual feature engineering, but xgboost can deal with low minutes players while filtering is required for reasonable results with linear regression.

And that’s it! Not too shabby.

--

--

John Chen

Senior MLE @ Meta, Rice U PhD, Ex-Microsoft, Ex-Advance Scout @ Wyoming, Ex-Analytics @ Central Michigan. Find me on LinkedIn: linkedin.com/in/john-c/