Wings are King- Valuing Player Positions Using Machine Learning for NBA Roster Construction

Published in

han_

11 min readApr 23, 2017

It was time for the 2015 NBA draft, and Sam Hinkie had the 3rd overall pick. Ahead of him went Karl Anthony-Towns, the consensus #1 pick, and then D’angelo Russell, the consensus top guard in the draft going to the point-guard-starved Lakers. The Philadelphia GM had more than a player choice at hand — it was a philosophical referendum: choose the NCAA tested, fundamental, post-up, mid-range savant in Jahlil Okafor or the little-known, 3pt shooting, shot blocking, European center in Kristaps Porzingis.

Hinkie ended up choosing Okafor with the third pick, and the Knicks happily drafted Porzingis with the following pick. In hindsight, that decision may have marked the beginning of the end of Sam Hinkie’s tenure in Philadelphia. Besides the argument that Hinkie should have valued risk and upside (given his high-risk, high-reward tanking manifesto), Hinkie also came down on the wrong side of history in the debate for valuing traditional, post-up bigs versus the new-age multi-dimensional shooting big. Now we can safely say that Porzingis represents the new guard of “unicorn” bigs.

And in that sense, Okafor is the anti-unicorn. He is highly specialized in both his skills and the areas of the floor he occupies. He plays primarily in the paint with his back to the basket, and this slow plodding style transfers to his lackluster defensive ability. He is an age-old relic, more T-rex than unicorn.

How does a player like Jahlil fit into today’s NBA positions and contribute to a team’s performance?

Previously, in part 1 of this analysis, I identified the 11 true types of NBA players that exist. The graphic shows the 11 positions, their attributes, and examples of past and current players that fit into these roles.

Link to part 1 below:

Defining Modern NBA Player Positions - Applying Machine Learning to Uncover Functional Roles in…

Jalen Rose loves to say- player positions were created to help fans understand the game. Point guard, shooting guard…

medium.com

11 true NBA positions by player function.

Today, in Part 2, I will be examining the impact of player positions and roster construction on team performance. In order to tease out this effect, I will use roster construction as a means of predicting team success.

My goal was to:

1. Create a predictive model using roster construction to predict team performance
2. Draw conclusions from the model output to make roster construction recommendations to improve teams

Data Acquisition and Processing

Using clustering techniques, I mapped every player’s season from 2000–2016 to 11 key player positions. For example, below we have the Knicks versus the 76ers 2016 rosters. Porzingis is a position 1, which is an Elite Big man. Contrast this with Okafor, who is a position 7- a Mid-Range Monster.

The Unicorn vs the Dinosaur head to head.

This look gives us a clear picture that Porzingis has been developing into a more useful player than Okafor. In order to see their impact on winning, I used roster construction as a predictor for team performance.

I used the total team’s minutes and found the % of minutes played by each position on each team. This includes aggregating the minutes played by more than 1 player who share the same position. For example, if Porzingis accounts for 20% of a team’s minutes for 2016, then 20% is attributed to position 1 for the 2016 Knicks since Porzingis is the only 1 on the team. On the Sixers, both Okafor and Jerami Grant are 7’s or Mid-Range Monsters. I added up the proportion of minutes played by both of these players and attributed it to the position 7 on that Sixers. After that, I am for every season for every team, I am left with a proportion of minutes played by each position 1 through 11. This is my feature set.

I also found the points scored and points allowed for each team across an entire season. I used www.basketball-reference.com again to find this data. I took the difference to determine the point differential for each team over each season. Point differential is often cited as a better predictor of future team win/loss than a teams current win/loss rate. This is because close wins and losses can contain a lot of noise since winning or losing close games has a large component of luck. However, the point margin indicates the true strength or weakness of a team’s performance regardless of the outcome of the game.

In the N.B.A., There Is a Message in Point Differentials

In the N.B.A. teams are ranked in their conference and division by the percentage of games won. Whether a team is…

offthedribble.blogs.nytimes.com

Modeling the Data

This left me with a dataframe containing rows of teams by season, and columns of the 11 positions and the % of teams minutes consumed by each position. This consists of the X, or the model features. The y vector consists of the win differential, that is the target of the model.

2016 Golden State Warriors roster with the proportion of minutes played by the 11 positions. The player corresponding to each position is shown on the right.

For example, above is the 2016 Golden State Warriors team and the breakdown of the season minutes by position. I decided to convert the point differential to a classification- 1 being a positive point differential and 0 being a negative point differential. The sample size I have is ~30 teams over 16 years, so ~500 rows of data. This is not a particularly large dataset to run a regressive model, so I decided that a classifier made more sense.

In choosing types of model, I implemented a model that would be able to:

Provide insight into which features were important to the model. This is crucial for us to draw insight on which type of player matters more to team success.
Take account of feature interactions- such as pairing a type 2 with a type 1 is different than pairing a type 3 with a type 1. The roster interactions are very important to capturing the effect of roster construction.

The first requirement ruled out black-box model types such as neural nets. The second requirement rules out logistic regression models which assumes no collinearity in the feature set. I decided to use a the RandomForestClassifier() within the sklearn.ensemble library.

3.2.4.3.1. sklearn.ensemble.RandomForestClassifier - scikit-learn 0.18.1 documentation

class sklearn.ensemble. RandomForestClassifier( n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2…

scikit-learn.org

I used the train_test_split() method within the sklearn.model_selection library to evaluate model performance. My goal here was not to optimize the model parameters or model type, but to create an adequately functioning model and reach some first order conclusions. The random forest classifier returned a decent performance and prediction:

I used a 75/25% train test split to create my test and train set. Then I used the cross_val_score from sklearn.model_selection to validate on the test set for the standard 3 fold validation. That resulted in a 64% model accuracy score. Let’s see what conclusions we can draw from the modeling.

Key Takeaways and Conclusions

The benefit with using the random forest classifier is that it includes feature importances. We can begin to understand which player types contribute most to team performance. Then, we can examine the differences between playing time for these player times in winning versus losing teams. Below is a chart highlighting the positions and the average proportion of team minutes played by each position type:

Using feature importances helps identify what positions have the most impact on team performance.

The recommendations that I can make based on these results are: decreasing minutes to “Non-Scoring Bigs” and “Mid Range Monsters” and increasing minutes to “Elite Bigs” and “Floor Spacer / 3&D players” have the largest positive effect on team win differential.

Looking at the bottom chart, features 4 (Non-Scoring Bigs) and 7 (Mid-Range Monsters) stand out as the two most important. When we look at the corresponding average minutes played by both those positions, we can see that teams with negative point differentials play 4's and 7’s significantly more than winning teams. Conversely, winning teams play 1’s (Elite Bigs) and 11’s (Floor Spacer / 3&D) more than losing teams. 1’s make sense- playing Elite Bigs help a team win — in fact, winning teams play all three of the elite positions more than losing teams. This is a straightforward conclusion, that playing elite players if you have them.

More interesting are which players not to play. What are 4’s and 7's? Those positions correspond to Non-Scoring Bigs and Mid-Range Monsters. This result confirms the prevailing wisdom today. Bigs who are not multidimensional, who cannot stretch their game beyond the paint or cannot score have become serious team liabilities. We have seen teams like Golden State in 2016 run these types of players off the floor. Non-scoring bigs allow opposing teams to essentially play 5 on 4 on defense, and collapse the paint, neutralizing driving lanes. Bigs who can shoot not only score effectively, but also space the floor to the benefit of the whole team.

Non scoring bigs become even bigger liabilities when they cannot defend effectively after switching onto smaller guards. The extreme version of this is Steph Curry raining death upon any big that dares switch onto him. It is almost comical the ease at which Curry can toy with bigs after the pick and roll switch, like a cat batting around a ball of yarn.

Steph Curry- “May I have us dance?”

Zach Randolph may be one of these last mid-range relics left in the NBA. The classic battles we have witnessed between the Warriors and the Grizzlies, such as in the 2015 playoffs, was a indication of the game evolving to one where multi-dimensional scoring bigs are required. Shooting from front court players is more than a luxury, it is now a necessity. Looking back at the heat map above, it is also clear that these Mid-Range Monsters are low in the “Rim-Protector” and “Sticky Hands” genes. Players like Zach Randolph are typically slow-footed and have trouble playing defense and switching onto guards off the pick and roll action. Randolph himself was played off the court as the 2015 playoff series against the Warriors reached it’s conclusion.

Mid Range Monsters:

Which teams play Mid-Range Monsters the most? Let’s take a look at 2016’s teams with the highest proportion of minutes devoted to Mid-Range Monsters.

What we find here are 5 teams with negative point differentials, including 4 teams that are in the bottom 5 of point differentials in the league [MIL -4.2 || PHO -6.6 || LAL -9.6 || PHI -10.2]. There are the usual suspects here — the two teams with Okafor and Randolph featured on their roster. What we see here is also the complete roster construction for these 5 teams. What stands out is that almost none of these teams feature elite players (except for Milwaukee). Having to give minutes to Mid-Range Monsters while there are no Elite Bigs on a team is — no surprise here — severely detrimental to team success.

Lakers 2016 Roster Construction:

Let’s dive into another team on that bottom 5 list- the 2016 Los Angeles Lakers. I am a life-long Laker’s fan and I would like nothing more than to help us return from the doldrums. What type of players are on the team? From the heat map, we see that the player types featured prominently are Secondary Ball Handlers and Mid-Range Monsters.

None of the players on the Lakers roster is elite. What is left is a team full of players that are functionally Secondary Ball Handlers, Role Playing Wings, and Mid-Range Monsters. Kobe has had an amazing career with the Laker’s, and a seemingly endless prime, but we can see that in 2016, he finally faded from an elite player to a supporting one. The problem is that, while he was functionally not elite, he tried to play like he was still a superstar.

I have to give him credit, for one game in 2016 — his final game of his career, he did put on a show that only Kobe could. I will never forget watching that game live. But I wasn’t really surprised — he always gave the people what they wanted and could always sense the moment.

The price for Kobe’s farewell tour season is that young lead guards D’Angelo Russell and Jordan Clarkson, and veteran supersub Lou Williams, had to make way for Kobe. That led to all four players playing like second-tier guards. Over 40% of the Lakers minutes were dedicated to these 4 players who were functionally Secondary Ball Handlers. This stunted the young Lakers development and held back the teams overall progression. We owed Kobe one season of glory, but boy am I glad that it’s over and we can move into the future with full conviction.

The bigs on the Lakers worry me. The second most prominent position on the Lakers are Mid-Range Monsters, including both of the young Laker bigs Larry Nance Jr. and Julius Randle. Brandon Bass has always been known as a primarily mid-range scorer, who rebounds effectively, and typically been a backup big man. For the young Lakers bigs to become more effective players, they must expand their games. As I mentioned, the single largest contributor to a negative win differential is giving minutes to these player types. Nance and Randle must move beyond the mid-range and develop an effective 3-point shot. They must rebound and score more to be elite big men. Nance has an amazing athletic profile, and with an improvement in defense, perhaps he can be evolve into a Defensive Anchor. Randle has a great handle and vision for a big man; perhaps he can take on more a facilitation role.

There was a reason the Lakers defense was so poor. There were no Defensive Anchors or Floor Space / 3&D players on the team, and instead all of the other wings on the Lakers were Role Playing Wings. It looks like Byron Scott’s opposition to 3 point shooting took a serious toll on the team. These are crucial positions that the Lakers need to acquire or develop the current players accordingly. The positional groupings provide an effective framework to both evaluate player development and effective team building.

The 2017 season just ended last week. In further analysis, I will look at how the players in the 2017 season were grouped positionally. Hopefully, I can also make predictions on playoff performance using results from this analysis. Finally, I will be looking further into the Lakers roster historically, and their 2017 player profiles so that I can make some draft recommendations. The upcoming lottery has dramatic implications for the future of the team, with the pick possibly lost to the 76ers if it does not land in the top 3. If the Lakers do keep their pick, it will be crucial to add players that will complement the current core.

In future work, I will be doing:

An in depth dive into the Lakers dynasty of 2000’s and their roster evolution to 2017. Implications on draft targets.
A look at the 2017 end of season stats and how those players fall into our modern positions
The translation to the prediction of playoff performance

If you have any suggestions for questions that I can try to answer with this framework, feel free to post in the comments below. Thanks!