NBA Salaries 2018: A Hedonic Pricing Model for the 2018–2019 NBA Season

Sean Holland
9 min readJun 26, 2019

--

Over the 2017–2018 NBA season we saw the number one draft pick play a grand total of 10 games (and still managed to be the youngest player to record a triple-double), James Harden win his first MVP award after finishing as runner-up in 2014 and 2017, and JR Smith pull off one of the most impressive blunders in NBA finals history.

WOW

Now, none of these events or achievements are why I am looking at data from the 2017–2018 NBA season. Rather, it is the season for which we have the most up-to-date salary data for the following season — in this case, the 2018–2019 season. With the 2017–2018 data, I will be analyzing how player stats for one season impact their pay for the following season.

I will be using data from the 2017–2018 season to compose a hedonic pricing model — a pricing strategy often used in real estate valuation to estimate the demand for heterogeneous characteristics in a good. In real estate valuation, the strategy is used to deconstruct the price of a building into several demand-generating characteristics such as the number of bedrooms, lot size, and distance from city centers or public transportation hubs.

In this player-salary model, I will deconstruct player salaries for the 2018–2019 season based on characteristics including typical per-game statistics such as points and assists, advanced statistics like win-shares and player-efficiency rating, and biographical information such as age.

Data

The dataset, which I scrapped from Basketball-reference.com, includes data on 331 players who logged minutes in the 2017–2018 season. Two notable players who I excluded from this analysis are Jeremy Lin and Gordon Hayward — both of whom sustained season-ending injuries in their season openers.

Mean: 9,315,813 | Median: 6,500,000
Mean: 10.5 | Median: 9.3
Mean: 60.2 | Median: 69
Mean 26.95 | Median: 27
Mean: 3.38 | Median: 2.8

The salary histogram demonstrates that salaries are heavily right skewed, with a median yearly salary of $6.5 million inflated by salaries like Steph Curry’s $37 million 2018–2019 income.

Like salaries, points per game are right-skewed, although not as heavily as salary. The average player scored 10.5 points per game, with James Harden recording the highest PPG of 30.4

The plurality of players recorded seasons of nearly 82 games and almost half of all players recorded more than 70 games. However, several players missed large swaths of the season. Players like Zach Lavine lost time due to injury while journeymen like Marshon Brooks were late season pick-ups.

The average NBA player is about 27 years old, but a few hold-outs like Vince Carter, Manu Ginobili, and Dirk Nowitzki helped skew the distribution slightly to the right.

Win shares are an advanced metric first conceived by Bill James for use in baseball. Justin Kubatko, the founder of Basketball-Reference has since adapted the concept to basketball. Without getting too deep into the calculation and methodology, win shares can be interpreted as the number of wins the given player has contributed to his team

In my cursory analysis, I discovered that points-per-game is the statistic most correlated to salaries. While I expect that there is a significant level of omitted variable bias belying the true relationship between PPG and salary, it still makes for a pretty convincing graph. And that blueish blip to the far right of the graph? That’s Steph Curry, who just so happened to be the highest paid player of the 2018–2019 season.

Models

Basic Statistics Models

The above models demonstrate a few key findings in this research. First, it appears that NBA front offices decreed that a player’s “prime” — the period of his career over which he plays his best basketball — ends at the age of about 30. The polynomial term “Age2” shows us that a player’s salary will max out at the age of 30, all else held equal.

Second, NBA front offices seem to undervalue the importance of defensive statistics. In fairness, part of this can be attributed to the inability of basic defensive statistics to capture the actual impact a player has on the defensive end of the court. Andre Iguodala, a player who earned the 2014–2015 NBA finals MVP award almost entirely for his ability to lock down Lebron James, only recorded 0.8 steals and 0.6 blocks per game during the 2017–2018 season. Stats like blocks and steals simply fail to capture the impact a player can have by deflecting passes, rotating correctly, and contesting shots.

Defensive Statistics Models

The second group of models I analyzed seek to quantify the monetary value of advanced defensive statistics. The two defensive statistics I focused on were defensive box plus-minus and defensive win shares. DBPM is derived from box plus-minus which uses a complex regression model to quantify how many points per 100-possessions a player contributes. DBPM is overall BPM minus the calculation for offensive BPM, signifying that DBPM is essentially the number of opponent points that a player prevents per 100-possession. In this dataset, the mean DBPM is -0.7 and the median DBPM is -0.2

Defensive win shares, simply put, take another regression model to analyze how many wins a player contributes through their defensive work over the course of a season. In this dataset, the mean DWS is 1.53 with a median of 1.4.

However, upon investigation, it does not appear that either DBPM or DWS are great indicators of a player’s future salary. I attempted to create formulas that accounted for both sides of the court, offensive and defensive, by taking offensive statistics like points-per-game and assists-per-game and total rebounds per game, which accounts for offensive and defensive rebounds, and adding the advanced defensive statistics. In both models which included PPG, APG, and RPG, the defensive statistics failed hypothesis tests at 5% significance. The only model for which all indicators were statistically significant was Model 2; however, Model 3 from the basic statistics models had a higher R²-value, indicating that it accounts for more of the variation in salaries. Therefore, I conclude that NBA front offices simply undervalue defensive performance.

Advanced Statistics Models

The final set of models I evaluated attempted to estimate how advanced statistics like usage rate, value over replacement, true shooting percentage, and win shares per 48 minutes relate to player salaries.

First, it is important to understand what these metrics mean. Usage rate estimates the percentage of team possessions that a player uses while on the floor, characterized as ending in a shot, free throws, or a turnover. The 2017–2018 season saw James Harden rank first in usage rate with 36.1 percent of team possessions ending with a Harden shot, free throw, or turnover.

Value over replacement is a “box score estimate of the points per 100 TEAM possessions that a player contributed above a replacement-level (-2.0) player.” Lebron James recorded the highest VORP in the league at 8.9, highlighting his impact on the defensive and offensive sides of the court.

True shooting percentage is a weighted average of a player’s shooting percentage, accounting for the point value of the shot. The formula is points/(2*total shots attempted). This formula gives appropriate weight to three-point shots, and in line with that weight, Steph Curry ranked fourth with a true shooting percentage of 67.5 percent.

Now, turning to the model estimates, there are a few interesting findings. First, perhaps expectedly, Model 1 demonstrates that minutes per game, usage rate, and VORP are positively correlated with salaries. The number of minutes and the amount of usage a player is given reflects the trust of the coaching staff and front office. Players that record high usage and minutes tend to be team leaders and vital role players and that could present a leveraging talking point in contract negotiations.

In contrast, Model 2 raised a few questions. Model 2 demonstrates that, when accounting for usage rate, minutes per game, and VORP, true shooting percentage is not a significant factor in determining salaries. This finding contradicts the notion of the NBA as an efficiency-oriented, Moreyball league. So, to answer this question, I generated a model with a dummy variable for players with low usage rates. I defined low usage as being one standard deviation below the league average for usage.

The model demonstrates that true shooting is not a significant indicator of salaries even for low-usage players when the model controls for points-per-game. Personally, I found this somewhat surprising. For star players with high usage rates and field goal attempts like James Harden and Russell Westbrook, one might expect that true shooting percentage would be less important than points scored. However, considering low-usage, catch-and-shoot players like Robert Covington, I would expect that the TS% and low-usage interaction term would be a significant indicator of salaries. While future contract years might see a strong relationship between efficient scoring and salaries, the relationship was not robust for 2018–2019 salaries.

Conclusions

After analyzing three different sets of hedonic pricing models, I found myself coming back to the earlier, “basic statistics” models. There is a certain elegance that comes with analyzing a player’s raw stats as an indicator of their value. Using points, assists, and rebounds to predict salaries is an easily interpretable model. On their own, advanced metrics like VORP, win shares, and box plus-minus fail to convey important information to casual fans and luddites in NBA front offices the way points-per-game does. I look forward to analyzing the 2019–2020 salary data to see if advanced metrics, defensive statistics, and efficient scoring assume a stronger relationship with player salaries.

In any case, below is the final model I concluded on to estimate player salaries

Simple can be beautiful

Initially, it was hard to appreciate the simplicity of this model. After all, I went through dozens of models involving advanced metrics, defensive metrics, and dummy variables for each position with several of these yielding insignificant coefficients or poor goodness-of-fit diagnostics.

While the R²-value on this model is slightly lower than the “advanced statistics” models, the coefficients and independent variables are much easier to interpret. With a model like this, it should be easy to see which players are underpaid and which players and overpaid in the market.

With this model, we can see that JJ Redick, who re-signed with the Philadelphia 76ers as an unrestricted free agent, is a steal for the Sixers. Redick signed a deal that earned him $12.25 million for the 2018–2019 season. However, according to the above model, an efficient price for a player like Redick is $14 million, thanks in large part to his 17.1 points-per-game.

In stark contrast, Chris Paul’s contract extension with the Rockets earned him a bloated, $35.6 million salary for the 2018–2019 season that makes him one of the most overvalued players in the league. Given Paul’s age and solid, but not earth-shattering stats, the model estimates that his value is closer to $20.3 million. Even running Paul’s stats through Model 1 of the “advanced stats” models only yields a salary of $21.9 million. Simply put, Chris Paul won that deal.

Most importantly, I should note, is that there is still plenty left to discover here. While basic statistics do a decent job of helping us understand player salaries, there are significant other factors at play when players and teams make deals. A player’s significance to his team’s city, jersey sales, loyalty to his team, defensive skills, and untapped potential are all variables that these models could not capture, and I hope to analyze those data in research to come.

Source: https://www.basketball-reference.com/

--

--

Sean Holland

Consultant - Deloitte Consulting | George Washington University Class of 2020 | International Affairs and Economics