What’s Your Basketball Personality?

Using Machine Learning to Redefine NBA Positions

Derek J. Hanson
The Startup
15 min readSep 9, 2020

--

Basketball Court : Photo by sergio souza from Pexels

The game of basketball continues to evolve as time passes. It is hard to imagine, but the game began without dribbling and the three point shot is a ‘modern’ invention many people alive today can remember the introduction of. In addition to formal rule changes, informal trends influence the composition of teams, what skills are valued, and how players play the game. In the ‘sprawlball’ era of today, it is clear that the traditional positions of Center, Forward, and Guard are archaic labels which do little to describe the function of players within the game. Today, a Center might bring the ball up the court (where is the point guard?) and a Guard might rebound better than a Forward. While it might sound like positions should just be thrown out and any player should be expected to perform any action at an elite level, it doesn’t seem we’ve entered into a ‘positionless’ era. If anything players have become more specialized (beyond the five classic positions). As such, articles commonly put forth fresh attempts to define new positions which more accurately describe the various ways players specialize within the game.

Like any game of skill, players play the game differently. In a sense, the loose style of play we recognize players as having (a ‘driving scorer’ or a ‘three point specialist’) are like personalities of play. Each player is more likely to do some things, and less likely to do others than the majority of players. These tendencies are like traits, defining how a player plays the game. We inherently notice some of these easily when watching (e.g., from behind the arc James Harden is likely to try a stepback, while Danny Green is more likely to catch and shoot), but there are likely patterns of play which are not readily visible. For these hidden patterns of play machine learning can be used to mathematically derive these patterns from measured data, grouping players into categories of similar play.

Prior Research and Proposed Model

This idea is not a new one. Many articles have appeared in the last twenty years with the intention of clustering players according to an algorithm and giving the resulting set of new positions descriptive labels. Articles published from Medium to academic journals have advocated for anywhere from six to thirteen new positions using a variety of features (player statistics) and clustering algorithms. This paper is different.

First, most previous research has utilized PCA (principal components analysis) followed by k-means clustering, a machine learning algorithm which forces each observation into a single class. However, this method is not probabilisitic, so one cannot easily use the results to predict the position of new observations. It is a one-shot description of the past. Alternatively, this paper utilizes the power of probabilistic mixture models (a Gaussian Mixture Model, to be precise). A GMM can model non-linear relationships well and provides a class estimate (probability of class membership) for each class for each player. Thus, it outputs the likelihood that a player fits each new position (e.g., how similar is LeBron to other players in this group?). A GMM also gives estimated parameters for each class (so the model can be used to give position estimates for future data). Thus, when looking at draft prospects (or a potential trade/free agent signing) a player’s stats can be fed into the model and estimated class probabilities would be given. This could dramatically change how teams are composed, helping general managers better understand how a new player will ‘fit’ with the others.

Second, previous research utilizes a limited number of features (statistics) and is overly dependent upon traditional box score metrics. With advanced player tracking software keeping tabs on player and ball movements we now have available new metrics which measure how players interact and play the game, why not use them? As long as features provide additional information which can explain variation, and are not overly correlated with one another, they can help us find differences in style of play not otherwise noticeable to the human eye (or too mundane for us to actively track, like screens leading to made field goals). This paper incorporates a much larger feature set than previous research, while maintaining the integrity of each feature (metrics are not combined) to aid in understanding the patterns the model produces.

So, instead of another simple descriptive attempt to redefine player positions from past seasons this paper provides a coherent set of new positions which can be easily understood in terms of their differences (what players in the class do more/less often than others) and can be used to predict class membership with new data.

Methodology

Data were compiled from NBA.com regular-season totals for the most recent five seasons (2015–16 through 2019–20) on 150 player-level variables. Statistics were converted to comparable metrics (per 36 minutes) to allow for even comparisons across different usage levels. To prevent those with limited minutes from muddling the data, only players appearing in at least 10 games and averaging a minimum of 5 minutes per game were kept. These preprocessing steps led to a final dataset used for analysis containing 2,234 observations.

Unique to this study is the sheer breadth of this feature set. Variables are included which measure both ends of the floor (offense, defense), movement (distance, speed), location (where shots take place, where defense takes place), passing (passes sent, passes received), possession (time with the ball), hustle (deflections, loose ball recoveries), and more. The inclusion of nearly 50 features more than many previous analyses allows for a finer-grained system of defining how players actually play the game. By including location and movement data, the true style of play can be envisioned in more detail than traditional box score metrics (rebounds, assists, etc.) can provide.

To reduce the feature set (many of the 150 features are essentially redundant), a correlation matrix was constructed and variables with correlations greater than |.8| were examined in greater detail. Variables which were essentially duplicate features were dropped (e.g. player height and player weight are strongly correlated; keeping just player height keeps this predictive information in the model and reduces multicollinearity amongst features). Trimming features through this step, and the application of human intuition (knowledge of how the game is played and whether a variable might be useful), left a final set of 80 features.

The 80 features used to estimate position profiles.

These 80 features were then estimated across the 2,234 observations with model-based clustering. Specifically, an expectation-maximization algorithm was used to fit Gaussian finite mixture models to the 80-dimensional dataset. No specific number of clusters were predicted to exist ahead of time, so models were fitted to estimate class sizes from three to eleven.

Results

Evaluating model fit indices (which can be done with GMM but not k-means) revealed a ten cluster model fit the data best. But how confident is the model of its predictions of class membership? Very confident. Nearly all players were predicted to belong to their class with greater than 90% probability. When given the statistics of a player for the 80 variables describing how they play, this model can confidently place them into a single position out of the 10 identified.

So what do these positions look like? Looking at the thee traditional positions for these players (as classified on NBA.com), now split into the ten new positions, we can see how the model appears to both follow the traditional idea (groupings tend to have more of one traditional position than another) but also finds more specific ways to differentiate players (members of each traditional position can be found in nearly each of the ten new positions).

10 New Positions

To get a grasp of what patterns the model discovered in the math we need to look at what each position does significantly better, and significantly worse, than the league average. For the charts that follow (one for each new position), of the 80 features included in the model, each position’s top 10 and bottom 10 features are plotted, with the values representing standard deviations above and below the mean (a league average player). Like taking a personality inventory, this helps us to visualize what traits make up each of our ten new positions. For a quick reference point, the table below includes a new name and selection of players currently classified under each position.

The ten new positions and a selection of players classified as such.

1. Triple-Double

The king of the ISO play, the triple-double threat player does it all on offense. He scores significant points from ISO plays, but also uses this attention to spread the ball to teammates, as he is well above league average in both actual assists and passes which could have become an assist (had the teammate made the shot). This scoring threat also runs the floor on the fast break, earns a large number of free throws and is likely to shoot threes when holding the ball, as opposed to directly from a pass.

However, players in this position don’t move much, nor are they particularly quick on defense. In fact, nearly all of their bottom traits revolve around not doing much on defense (below average rim defense, speed on defense, distance covered on defense, personal fouls). Don’t tell LeBron, but apparently he’s a similar defender to James Harden…

Representative Players

LeBron James, James Harden, Luka Doncic, Russell Westbrook

2. Anywhere

Another ISO scorer, this player manufactures points through a variety of moves, though they are most likely to start from the perimeter. This scorer dominates offensive possessions, receiving significantly more passes from teammates and holds the ball for much of the shot clock. However, they are hard to defend, as they score in a variety of ways, from coming off a screen, to pull up jumpers, to driving to the basket. With all of this attention, they also include their teammates in productive ways more than most other players through assists and potential assists, though the passes to teammates from this scorer are rarely to bigs (most of their assists are to three point shooters, so don’t fall for that fake alley-oop, this scorer isn’t going to give it up down low).

However, this scoring-focused player does not add to their team’s possessions, rarely obtaining offensive rebounds or even being in a position to get an easy rebound (which keeps them from being a triple-double threat). This appears to be a key difference between these first two positions. Both are kings of ISO plays and clearly dominate their teams’ offenses while ‘phoning it in’ on defense. However, this anywhere scorer isn’t quite as valuable to his team as he hasn’t yet realized the importance of grabbing a few extra possessions for his team (in the form of rebounds). Tell Kawhi to grab a few more boards and he can move up to Triple-Double status.

Representative Players

Kawhi Leonard, Jimmy Butler, Damian Lillard, Kemba Walker, Chris Paul, Stephen Curry, Ben Simmons, Kyrie Irving, Jamal Murray, Trae Young

3. Jumper

Instead of driving to the basket, or getting their teammates involved through assists, this player uses their above average ISO plays to shoot jumpers. Whether a pull up or a spot up, from behind the arc or inside, you better get your hand up, then box out, because this guy is likely to try to shoot over you. This heavy minutes player also runs the floor well (scoring significant points from fast breaks and off of turnovers), so you also better get back on defense when they’re on the floor or you’re likely to pay.

Noticeably, this scorer doesn’t like screens/pick-and-rolls, so if you are guarding them you should focus on keeping them in front of you. They also aren’t likely to score on you in the paint or at the elbow (they’ll likely move you to one of their favorite places to shoot a J instead). This scorer doesn’t get too many rebounds on offense, nor are they likely to defend the rim. Lastly, don’t read as much into the patterns of this player, their characteristics range below one standard deviation both above and below average. Tell KD he better mix it up some more because he’s getting predictable.

Representative Players

Kevin Durant, Buddy Hield, Pascal Siakam, Marcus Morris Sr., Marcus Smart, Andrew Wiggins, Paul George, Klay Thompson, Khris Middleton, Kyle Kuzma

4. Moneyball

Living at the perimeter, this three-point maestro isn’t going to drive on you. Instead, he’s going to either catch a pass and send it towards the rim or spot up on you, almost certainly from behind the arc. This makes him an efficient scorer (in terms of points per touch). But he also plays defense, covering a lot of ground (likely good at getting back at changes of possession) and nabbing long rebounds (perhaps his own?).

Likely because of his penchant for the three-ball, this arc-assassin doesn’t draw many fouls, nor does he score in the paint (not likely to drive on you). He might be active on defense, but he’s not likely to block your shot. Further, on offense, he doesn’t get the ball much (low touches), so when he does, he’s likely to shoot (low passes made). Danny Green shoots threes? What?

Representative Players

Danny Green, J.J. Reddick, P.J. Tucker, Jae Crowder, Royce O’Neale, Landry Shamet, Kyle Korver

5. Pass First

A primary ball handler, this player is similar to the traditional point-guard of the past. He leads the offense (high time of possession, passes received, time with the ball) but is a pass-first scorer. He is far more likely to drive on you than spot up and shoot over you and he generates offense for his teammates through his passes.

Not likely to post you up or to receive a pass in the paint (does he ever cut to the basket?), his tendency to avoid the paint also cuts down on his ability to secure rebounds on both offense or defense (limited rebounds). Tell Patty Mills he needs to shoot more because the model says he looks an awful lot like Lonzo Ball…

Representative Players

Ricky Rubio, Patty Mills, Kyle Lowry, Lonzo Ball, Rajon Rondo, Patrick Beverley, Cory Joseph, Shai Gilgeous-Alexander

6. Defender

What sets this player apart is clearly their defense. They cover enormous ground, and do so with high average speed, leading to steals and points off of turnovers.

However, when on offense they are more likely than others to shoot contested threes (maybe not a good idea), and relatedly, rarely pass the ball to their teammates (well below average on passes made, assists, and potential assists). They score often in the paint, but shoot well below average from the field (again, maybe they should shoot less threes). Perhaps because of their poor shooting and lack of assists, they also don’t get many touches/passes on offense. Tell Harrison Barnes if he makes a few more passes when someone closes out on him at the arc he might make a few more friends (and get a few more passes/open shots in the future).

Representative Players

Harrison Barnes, Kyle Anderson, Gary Payton II, Thaddeous Young, Thabo Sefolosha, Mikal Bridges, Thanasis Antetokounmpo, Cam Reddish

7. Big Jump

Blending roles of a traditional center and forward, this player scores his points from pick-and-roll and catch-and-shoot opportunities (as opposed to post-ups or elbow moves). Using his height, he helps his team immensely on defense, both defending the rim, contesting two point shots, and boxing out to get defensive boards.

However, you won’t see him driving to the basket from the perimeter, nor does he hold the ball for long when he gets it. While he contributes on defense through contesting shots and gathering rebounds, he doesn’t run the floor well on fast breaks, nor is he likely to get a hand on passes to deflect or steal a pass. I’ve seen Kevin Love’s outlet passes, they do the fast breaking for him…

Representative Players

Al Horford, Brook Lopez, Kristaps Porzingis, Kevin Love, LaMarcus Aldridge, Paul Millsap, Draymond Green, Markieff Morris, Marc Gasol

8. Double-Double

One of the most efficient scorers (in terms of made shots), this is another pick-and-roller but instead of shooting a jumper, this player is more likely to score from the elbow or in the paint. While this player doesn’t get many ISO plays, he earns a high number of touches (he’s an offensive threat) all over the frontcourt. His efficiency likely comes from a number of dunks, layups, and cleaning up missed shots (which help him lead the way in double-doubles).

Like the other specialist in rolling from pick-and-rolls, he isn’t likely to take too many threes, nor does he hold the ball for very long when he does get it. Interestingly, he doesn’t move much on offense or defense, nor does he move quickly at either end, staying mostly near the rim unless setting a screen. This doesn’t mean he isn’t a defensive threat, he just likely sticks near the basket on D. Don’t tell Joel Embiid I said he doesn’t move much on D.

Representative Players

Giannis Antetokounmpo, Zion Williamson, Karl-Anthony Towns, Anthony Davis, Joel Embiid, Nikola Jokic, Serge Ibaka, Hassan Whiteside

9. Painter

When this player sets screens, he means business. His screens actually lead to made field goals, though he is also an offensive threat himself (just not off of the screen). He scores his points in the paint, often working from the elbow, and dominates offensive and easy rebounds. You aren’t likely to get any easy shots off of him, either, as he contests twos and defends the rim well.

If you do spot him outside the arc, though, you shouldn’t expect him to take, or make, many shots. He also won’t drive on you or shoot a pull up J when he catches a pass. This man lives in the paint (unless he’s about to blindside you with a solid screen). These are like the offensive linemen of basketball, if you score from one of their screens you owe them a Rolex or something nice. I better see Steven Adams glistening with some new jewelry next year.

Representative Players

Rudy Gobert, Jarrett Allen, Steven Adams, Ivica Zubac, DeAndre Jordan, JaVale McGee, Clint Capela, Enes Kanter, Mason Plumlee

10. Center

Nearly identical to the Painter, this position differs by being far more likely to record a block or shoot an uncontested two (aka dunk). This player also either doesn’t set many screens, or at least is less effective when doing so (low screen assists). This position appears dead (no one registered as such in the 2019–20 season), and the players previously featured are commonly classified as Painters or Tall Jumpers today. Perhaps the rising importance of setting screens in offensive sets led to the reclassification? Either way, as of now, this position seems moot.

Representative Players

None in 2019–20. In the past: Michael Beasley, LaMarcus Aldridge, Boban Marjonovic, Zach Randolph, Andre Drummond

Conclusion

Using a player’s style of play to define their position, instead of classic positions, reveals new information useful to both teams and fans.

With twice the number of positions to define players, teams can develop finer grained analyses of rosters. Additional positions make it easier to separate style of play, which makes it easier to evaluate how different positions play together, allowing teams to answer questions like ‘which combinations of positions are more successful than others?’ and ‘which positions contribute the most to success of a team?’

Additional ‘style of play’ positions also allow for fine-tuning defensive strategy. Shot charts are publicly available (similar to a spray chart of hits in baseball), but with these new positions we now also have information about passing, rebounding, movement, etc. This means player tendencies can more easily be expanded beyond just shooting and which direction a player favors when dribbling. For example, defensive strategies can now easily incorporate additional tendencies, such as passing on a drive versus shooting, lobbing an alley-oop versus passing to the arc, or swinging the ball versus shooting a contested shot. While these can certainly be developed at the individual level with available statistics, remembering the tendencies of a position is far easier than of all possible players one might guard.

Finally, with more emphasis on style of play we can evaluate whether/how players change their style over time. Did Kawhi play differently with the Spurs versus the Clippers? Did Kyrie Irving change across his time with teams? Is there a pipeline to Triple-Double status? Understanding how a player may adapt to either a new lineup/system or advancing through their career can be important for determining whether to roster a potential free agent or draft pick.

Next Steps

In the future look for posts where I analyze these positions by lineup (what combinations add the most value/least value?) and answer the question of whether/how player styles change over time (can you ‘grow into’ some of these positions?). In the meantime, you can play with the output from the model and use it to predict player positions yourself! Find the relevant data over on my GitHub here.

--

--

Derek J. Hanson
The Startup

Data Scientist | M Quantitative Methods | PhD Candidate