Using Machine Learning to Predict the 2023 NBA Draft in the age of Positionless Basketball

Rishichandran
6 min readJun 22, 2023

--

Video alternative: https://youtu.be/bgc7c4joIpU

Full Results: https://drive.google.com/file/d/1TeaattbcAN52-WbHKgRd4_wsMQrnDKHe/view?usp=sharing

With the new NBA CBA looming, being able to find success with late-1st round picks, 2nd round picks, and UDFAs is becoming increasingly important. Additionally, the number of non-college prospects has increased in the past 5–10 years, with the global growth of the game, and emergence of alternate routes such as the G-League Ignite, Overtime Elite, and NBL Next Stars program.

I wanted to try and approach NBA Draft prospect analysis in a unique way, in order to tackle these issues. Disclaimer: the results of this analysis shouldn’t be the only thing someone uses to evaluate prospects. If anything, this type of research could serve as a starting point for film breakdown, where scouts can identify where predictions may be inaccurate, or uncover the “why” behind certain projections.

When creating a machine learning model to predict NBA Overall Impact, I wanted to make sure the player pool that the model is “learning” from reflects the type of players that it’s making predictions on. To do this, I focused on Player Age, & the League that the player came from to separate players into different groups, which would ultimately each get their own machine learning model.

We’re looking at the average Overall Impact Rating for NBA players, based on the age that they entered the NBA.

We tend to see younger players at the top of draft boards, followed by the best upperclassmen coming out of college. I wanted to define an age cutoff, where we can consider someone either a “young prospect” or an “old prospect”.

This bar graph shows us that on average, players under the age of 22 tend to fall above the average Overall Impact Score in the NBA, and players 22 or older fall under the NBA Overall Impact Score average.

Next, I wanted to establish where NBA prospects are coming from, because the amount/granularity of player data may vary between different leagues. Here are the following buckets I established for players:

  • Young College Prospects: College players who are under the age of 22 at the time of the draft.
  • Old College Prospects: College players who are 22 or older at the time of the draft.
  • Non-College Prospects: This includes all international, G-League Ignite, and Overtime Elite prospects.

While the level of competition can vary within each player bucket, each group had the same kind of data available, and enough historical examples of players that the model could learn from.

Next, I wanted to account for the types of roles that players played prior to the NBA. We’ve seen high-usage college players like Jimmer Fredette fail to work out in the NBA in their new role, and players like Devin Booker go from a 6th-man in college to a #1 option in the NBA. In order to do this, I created metrics that accounted for Usage, rather than using traditional stats.

The x-axis represents player usage, and the y-axis represents points per game.

Here we’re looking at college Points Per Game (PPG), based on each player’s Usage. The trend line on the plot represents the expected PPG, based on that Usage.

Then, I took the amount that a player was above/below the expected PPG, in order to calculate Points Above Expected (PAE):

This approach could be improved by using more than just Usage to predict Expected Points, but this approach allowed me to see what players had the best blend of volume & efficiency, based on the opportunities they were given.

This was repeated for metrics, such as rebounds, assists, steals, and blocks, in order to create our variables for the model.

Next, I wanted to consider the roles/archetypes that NBA players fall into in the modern NBA. While we already have the traditional 5 positions, we are seeing more “positionless” players coming into the league, which is something I wanted to quanitfy.

I used a machine learning approach called K-Means Clustering to first determine how many different roles there are in the NBA, and to then assign each NBA player to a cluster/role.

Example of K-Means Clusters

Here, we’re looking at an example of how K-Means Clustering determines each cluster/group. Based on the variables used, the model uses something called Euclidean Distance (basically how far each point is from one another) to determine which group each data point belongs to.

In our NBA Role/Archetype dataset, the model determined that the best number of roles is 12. Here are the 12 Roles/Archetypes that we used for this project:

Finally, I wanted to use machine learning to predict the following things for each draft prospect:

  • Projected NBA Skillset: This includes five grades for the following: Scoring, 3PT Shooting, Facilitating, Interior, and Defense, scaled on a 1–10 range.
  • Overall Impact Score for each Role/Archetype: An overall 1–10 grade for each prospect, if they were to play a certain role.
  • What Role/Archetype each Prospect Will Become: Represented as the % odds that a prospect will become a certain role.
  • Overall Draft Prospect Score: Used to determine the prospect rankings.

To accomplish this, I used a machine learning model called KNNRegressor, which works similarly to the K-Means Clustering model, in that it makes predictions based on the players whose statistical profiles are most similar to the prospect.

Here’s how the models projected the top 14 of the 2022 NBA Draft Class:

  • Nikola Jovic
  • Walker Kessler
  • Jalen Duren
  • Paolo Banchero
  • Chet Holmgren
  • Ryan Rollins
  • Keegan Murray
  • Julian Champagnie
  • Jake LaRavia
  • Tari Eason
  • Bennedict Mathurin
  • Jabari Smith Jr.
  • Kenneth Lofton Jr.
  • Marcus Bingham

Here’s how to read the draft reports:

Finally, here are top prospects in the 2023 NBA Draft, according to the models! I have projections for all draft-eligible players in 2023, so if anyone isn’t listed, feel free to reach out and ask about them.

--

--