Finding the Right Player- K Means Cluster Analysis on NBA Point Guards

Daniel Adams
INST414: Data Science Techniques
6 min readMay 1, 2024

By Daniel Adams

Introduction

Basketball is one of the most fluid games played in the United States. Players are extremely fast and agile, all while being extremely tall. These attributes in combination with 30 teams only having 12 men on their rosters, mean that having the opportunity to become a NBA caliber basketball player is extremely rare for the average person. That being said, the talent available for coaches and general managers to scout are limited as well. To fill the five man starting roster, coaches primarily select a point guard, a shooting guard, a power forward, a small forward, and a center. Point guards are players that primarily function as the offense’s facilitators who assist baskets and initiate plays. A shooting guard’s main skillset is his ability to hit three point and long range two point field goal attempts. The center is usually one of the tallest players on the court, and his main responsibilities are for rebounding the ball and finishing close to the basket. Power forwards are slightly more agile centers, while still being one of the tallest players on the court. Players in this position are also expected to have good shooting abilities within the three point arc. Small forwards are similar to power forwards, but as the name suggests, are smaller physically, and possess better shooting abilities.

Questions

As mentioned, becoming selected for a team’s roster is extremely competitive, which puts significant pressure on coaches and general managers to select their players. In order to acquire players, the NBA has a combination of a lottery draft and traditional draft, which teams use to draft college basketball players. Beyond just the draft and lottery, teams can also trade players amongst themselves and in some cases draft free agents. Many of the great basketball teams of the past decade have been built through this trading system, as opposed to directly drafting the players from college. Therefore, how do coaches and general managers find the right players to acquire? Additionally, how do they know which players they are willing to give up to lose the least amount of talent?

Stakeholder and Approach

The stakeholders for this analysis are coaches and general managers (GMs) who are seeking to initiate a trade for players. Specifically, this analysis focuses on the data of players whose role is the point guard position. With that in mind, the exact stakeholder would be any coach and GM seeking to acquire a high quality point guard. While this may sound straight forward, there are many different play styles within the point guard position. There is the traditional facilitator, whose main role is to understand the game and keep the offensive flow. Another play style, the more modern scoring point guard focuses on scoring abilities beyond the three point arc. As the three point shot has become so critical in the modern NBA, another play style combining the modern and traditional skill sets has emerged as well. Besides offensive play styles, coaches are significantly concerned with a player’s ability and will to defend. That being said, defensive abilities will also be analyzed.

Data Cleaning

The dataset used for this analysis was sourced from Kaggle and was relatively clean to begin with. The only cleaning necessary was to isolate the players who are point guards from the entire player dataset. Additionally, I felt that the stakeholder would only want analysis of starting caliber point guards, so I cleaned the point guard subset of data to only contain players who play 25 or more minutes a game, which is the average for a starting player. After the data was cleaned, the only data points left were players who play in the point guard position and play enough minutes to be considered a team’s starter.

Data

Basketball has a lot of statistics used to determine player performance. For the scope of this analysis, the data points used are listed below:

  • 3PA (three Point Field Goal Attempts): number of three point field goal attempts by the player
  • 3P% (three Point Field Goal Percentage): percentage of three point field goals made by the player
  • 2PA (two point Field Goal Attempts): number of two point field goal attempts by the player
  • 2P% (two Point Field Goal Percentage): percentage of two point field goals made by the player
  • TRB (Total Rebounds): total rebounds by the player
  • AST (Assists): number of assists made by the player
  • STL (Steals): number of steals made by the player
  • BLK (Blocks): number of blocks made by the player
  • TOV (Turnovers): number of turnovers made by the player

Data Analysis

In order to analyze the different point guards, k means cluster analysis was used to cluster the players based on the above features. This allowed for the algorithm to identify similar players based on euclidean distance and group players with similarities into their own cluster. Each of these clusters will show the cluster’s top statistics, which will show the play style the cluster intends to represent. Additionally, a list of players that belong in a cluster will be shown. In doing this, the stakeholder can identify the style of point guard that he desires, and also be given a short list of who he can attempt to acquire. Shown below are the clusters of point guards using k means clustering analysis.

Cluster 0 highlights players with a combination of two point and three point shooting, which can infer that they are a combo play style point guard. Additionally, this cluster includes significant rebounds (TRB) and assists (AST). With that in mind cluster 0 is a list of combination point guards.

Cluster one players have a very high total points. Additionally, this cluster includes free throws attempted as a significant statistic. Therefore one can infer that these point guards drive towards the basket, drawing fouls in the process. In combination with those statistics and their three point and assists numbers, this is a very unique type of point guard that acts as a jack of all trades.

This set of point guards is very similar to the first cluster. That being said, cluster two does not have as many three point field goals attempted. Therefore, one can consider these point guards as the more traditional facilitator type point guard, which was described earlier.

The fourth and final set of point guards seems similar to cluster one. That being said, a differentiating statistic from this cluster is the players’ emphasis on two point field goals and free throw attempts. This can infer that this style of player tends to drive inside the three point arc to score and assist. Additionally, this set of players do shoot three point attempts, but definitely prioritize closer, higher percentage shooting chances.

Limitations

A key limitation of this analysis is that it only analyzes point guards. Additionally, the analysis does not show the stakeholder the value of the players in question. That being said, the analysis still provides the stakeholder an answer to which types of point guards are available, which is a key part of the stakeholder’s question. Even though the analysis lacks information on what is necessary to trade for some of these players, the analysis gives the stakeholder a great starting point on researching specific types of point guards.

GitHub Code: https://github.com/dadams16/INST414AdamsModules/tree/main/module4

--

--