NBA Player Value Models: What is RAPM and its limitations

John Chen
3 min readMay 25, 2023

--

The majority of popular publicly available metrics (such as DARKO, LEBRON, EPM, BPM) rely on first calculating RAPM (Regularized Adjusted Plus Minus) and then building a model that predicts RAPM.

What is RAPM?

RAPM is typically calculated by taking the last three seasons of all play by play data, weighting the latest season the most, and solving for a linear system of equations where every row of that system are the 5 offensive and defensive players on the floor between every substitution of every game and the resulting plus minus (also called a stint). We hope to find the plus minus contribution of every player by solving the linear system.

For example, let’s say the Nuggets are playing the Lakers and the following happens:

Jokic | Porter | Gordon | Murray | KCP    | scores 14 points on 10 possessions 
against
Lebron| Davis | Reaves | DLo | Lonnie | scores 12 points on 9 possessions

Porter subs for Brown, DLo subs for Schroder

Jokic | Brown | Gordon | Murray | KCP | scores 10 points on 13 possessions
against
Lebron| Davis | Reaves | Schroder | Lonnie | scores 18 points on 15 possessions

The equations we want to solve are:

Jokic+Porter+Gordon+Murray+KCP-Lebron-Davis-Reaves-DLo-Lonnie = 14/10 = 1.40PPP
Lebron+Davis+Reaves+DLo+Lonnie-Jokic-Porter-Gordon-Murray-KCP = 12/9 = 1.33PPP

Jokic+Brown+Gordon+Murray+KCP-Lebron-Davis-Reaves-Sch-Lonnie = 18/15 = 1.20PPP
Lebron+Davis+Reaves+Sch+Lonnie-Jokic-Brown-Gordon-Murray-KCP = 10/13 = 0.77PPP

If we focus on the first line, what this means is if Jokic, Porter, Gordon, Murray, KCP are attacking and Lebron, Davis, Reaves, DLo, Lonnie are defending, how efficient was the offense?

The points per possession (PPP) is further processed by subtracting the league average PPP. We also treat a player on offense and the same player on defense as two separate players to estimate the offense and defensive impact separately.

We have this kind of equation for every lineup combination that shared the floor for every team and game and then we solve for this system of equations. The resulting value for each player is the predicted PPP impact on offense (ORAPM) and on defense (DRAPM). Adding the two values gives overall RAPM.

Challenges with RAPM

The widespread usage of RAPM as the target is a bit surprising for a few reasons:

If all it takes is 3 years of data, why not just recalculated RAPM every day and use that?

RAPM is still very noisy.

If RAPM is still noisy, why do we use it as a target as opposed to the ground truth of the original play by play data?

By utilizing simple models such as linear regression with an augmented box score as input, the model is extremely underparameterized and the noise in RAPM will hopefully even out.

The Holy Grail

The fundamental question I want to ask is: What is stopping us from directly predicting the plus minus of each stint? Can we use deep learning methods to resolve this noise problem directly by sidestepping RAPM? We will lose interpretability but perhaps this will open up a new class of more accurate models.

Follow me as I explore these questions.

--

--

John Chen

Senior MLE @ Meta, Rice U PhD, Ex-Microsoft, Ex-Advance Scout @ Wyoming, Ex-Analytics @ Central Michigan. Find me on LinkedIn: linkedin.com/in/john-c/