The majority of popular publicly available metrics (such as DARKO, LEBRON, EPM, BPM) rely on first calculating RAPM (Regularized Adjusted Plus Minus) and then building a model that predicts RAPM.
What is RAPM?
RAPM is typically calculated by taking the last three seasons of all play by play data, weighting the latest season the most, and solving for a linear system of equations where every row of that system are the 5 offensive and defensive players on the floor between every substitution of every game and the resulting plus minus (also called a stint). We hope to find the plus minus contribution of every player by solving the linear system.
For example, let’s say the Nuggets are playing the Lakers and the following happens:
Jokic | Porter | Gordon | Murray | KCP | scores 14 points on 10 possessions
against
Lebron| Davis | Reaves | DLo | Lonnie | scores 12 points on 9 possessions
Porter subs for Brown, DLo subs for Schroder
Jokic | Brown | Gordon | Murray | KCP | scores 10 points on 13 possessions
against
Lebron| Davis | Reaves | Schroder | Lonnie | scores 18 points on 15 possessions
The equations we want to solve are:
Jokic+Porter+Gordon+Murray+KCP-Lebron-Davis-Reaves-DLo-Lonnie = 14/10 = 1.40PPP
Lebron+Davis+Reaves+DLo+Lonnie-Jokic-Porter-Gordon-Murray-KCP = 12/9 = 1.33PPP
Jokic+Brown+Gordon+Murray+KCP-Lebron-Davis-Reaves-Sch-Lonnie = 18/15 = 1.20PPP
Lebron+Davis+Reaves+Sch+Lonnie-Jokic-Brown-Gordon-Murray-KCP = 10/13 = 0.77PPP
If we focus on the first line, what this means is if Jokic, Porter, Gordon, Murray, KCP are attacking and Lebron, Davis, Reaves, DLo, Lonnie are defending, how efficient was the offense?
The points per possession (PPP) is further processed by subtracting the league average PPP. We also treat a player on offense and the same player on defense as two separate players to estimate the offense and defensive impact separately.
We have this kind of equation for every lineup combination that shared the floor for every team and game and then we solve for this system of equations. The resulting value for each player is the predicted PPP impact on offense (ORAPM) and on defense (DRAPM). Adding the two values gives overall RAPM.
Challenges with RAPM
The widespread usage of RAPM as the target is a bit surprising for a few reasons:
If all it takes is 3 years of data, why not just recalculated RAPM every day and use that?
RAPM is still very noisy.
If RAPM is still noisy, why do we use it as a target as opposed to the ground truth of the original play by play data?
By utilizing simple models such as linear regression with an augmented box score as input, the model is extremely underparameterized and the noise in RAPM will hopefully even out.
The Holy Grail
The fundamental question I want to ask is: What is stopping us from directly predicting the plus minus of each stint? Can we use deep learning methods to resolve this noise problem directly by sidestepping RAPM? We will lose interpretability but perhaps this will open up a new class of more accurate models.
Follow me as I explore these questions.