Predicting NBA Career Longevity

Rohan Kulkarni
3 min readSep 27, 2019

--

Will He Last?

This is one of the many questions NBA front offices ask themselves when evaluating young talent. They recognize that although making it to the NBA is very hard, staying in it is even harder.

A dataset published on data.world contains a sample of NBA players, statistics from their rookie season, and a binary classifier stating whether they lasted 5 years in the NBA. From this dataset we attempt to quantitatively predict whether a player will last 5 years in the league based on their rookie season.

To hoop junkies this is familiar territory. For everyone else, this table consists of individual player statistics from their performance on the court. Note that the final column in the table ‘5YRS’ is our target variable indicating whether the player lasted 5 years (1=yes, 0=no). Here is a glossary for the other columns you may be unfamiliar with:

The baseline performance (aka what the predictive model will be compared to) is 62% accurate. Using a Random Forest Classifier boosts the accuracy significantly to 70.69%. And removing features with minimum importance slightly bumps the model to 71.16%.

Feature importance analysis led to the claim that Games Played in your rookie year is by far the most significant metric when predicting career longevity.

Intuitively this makes sense because out of the 15 active NBA roster spots, only 5 are in a game at a time. Ideally, the more playing time a player gets the more they develop as a player and contribute to their team winning.

A partial dependence plot shows the effect of a feature on the model’s predicted outcome. In this case, a PDP of Games Played visually supports the previous claim that the more games a player plays his rookie season, the higher his chances are of staying in the league.

Realistically there are many qualitative, mathematically unmeasurable characteristics that factor into predicting whether a player will last in the NBA — such as work ethic, unselfishness, scheme-fit, etc. But as a data scientist I enjoy using numbers to attempt solving the big picture.

For this model, number of games played seems to be the most important metric when predicting career longevity. Therefore I look forward to seeing which rookie plays the most this upcoming season.

--

--