NBA Player Value Models: Calculating RAPM, the backbone of BPM, RPM, DARKO and most publicly available stats

John Chen
3 min readMay 25, 2023

--

Excerpt from NBA Metrics: What is RAPM and its limitations:

“The majority of popular publicly available metrics (such as DARKO, LEBRON, EPM, BPM) rely on first calculating RAPM (Regularized Adjusted Plus Minus) and then building a model that predicts RAPM.

RAPM is typically calculated by taking the last three seasons of all play by play data, weighting the latest season the most, and solving for a linear system of equations where every row of that system are the 5 offensive and defensive players on the floor between every substitution of every game and the resulting plus minus (also called a stint). We hope to find the plus minus contribution of every player by solving the linear system.”

The Exact Methodology

More specifically, I calculated 3-season RAPM using the following methodology:

1. Download NBA play by play data, using defensive rebound, turnover, end of clock, not and-1 made shots to split possessions and convert to stints.

2. Filter out 0 possession stints and non-NBA teams.

3. Adjust for home-away advantage (roughly add 0.02 PPP to away offensive possessions), rubber band effects (as much as +-0.07 PPP when up/down 10 or more in second half of the fourth quarter), FT% (to shooters season average) and 3PT% (to shooters season average).

4. Replace players that played less than 1% of the teams possessions with a single id and filter out the stints which only contain these players.

5. Double and flip each stint to calculate offensive and defensive RAPM separately. Remove the replacement level players with the single id and use the value -0.02. Recompute the PPP of each stint by subtracting the league average PPP and clipping to [-4, 4].

6. Sparse L2 regression with sample weight = num possessions of stint * season weight. (Season weight: latest=3/6, nextlatest=2/6, earliest=1/6 to weight recent seasons more). Tried alpha=500,1000,1500. Center predicted ORAPM and DRAPM to mean 0.

7. Print out and sanity check (alpha=1000): https://docs.google.com/spreadsheets/d/e/2PACX-1vQFz80gv9soGtEfGZ-TEQvWkDPCLeyEV_nFrvy5NkNol-SURjmc4ouWsIktTR72e0WYuOvehB_qfg8k/pubhtml

Top 15 RAPM calculated between the 2020–2023 seasons.

Evaluating Best Adjustments

Afterwards, 3-season season on season RAPM consistency was evaluated and determined that all four adjustments (home-away, rubber band, FT%, 3PT%) resulted in the most consistency season to season RAPM. We expect to see some error due to players getting better/worse, and other players getting more/less minutes which increases/decreases variance. Still, the season to season MSE and MAE (~1.4MSE and ~0.85MAE going from 3 season weighted 2022 RAPM to 2023 RAPM) are quite high considering the difference between the Number 30 and Number 50 on DARKO (as of today) is 0.4.

Next Steps

RAPM is quite noisy but is an easier and less noisy target to train on compared to per-game offensive or defensive efficiency or per-stint offensive or defensive efficiency. Next, we train a model to predict ORAPM and DRAPM.

--

--

John Chen

Senior MLE @ Meta, Rice U PhD, Ex-Microsoft, Ex-Advance Scout @ Wyoming, Ex-Analytics @ Central Michigan. Find me on LinkedIn: linkedin.com/in/john-c/