Using machine learning to predict which college stars are NBA level caliber
Read on for how I came up with this data!
I shared my previous post and got a suggestion by someone to try a machine learning model on college stars to predict which of them would become All Stars in the NBA. Well, that was really hard. So I’ve kind of split the difference and tried to predict which college stars are NBA quality players, and which ones have a shot at becoming All Stars.
I now understand why college scouts exist. One of the dangers of scouting is that there is a fine balance that must be struck between “the eyeball test” and statistics. You will get burned if you attempt to draft players based solely on one methodology over the other. Advanced analytics are a newer thing to the NBA, and a really really new thing to the NCAA. So the data is not as rich/easy to get. I’ve told people that 80% of machine learning work is about getting good data. And that was absolutely the case here. This was one of the most difficult scrapers I’ve ever written. It turns out there’s over 40,000 NCAA basketball players dating back to 1992 (as far back as the data set would go). If possible, I use the “requests” Python library to scrape websites. If that falls short, I typically create a Selenium web driver and scrape the data using lxml. Like my last project, I once again use SKLEARN for the machine learning algorithm and SPORTS-REFERENCE.com for my data.
After 3 full nights of work, I finally have some semblance of a list. But before I do that, I want to provide the background of how I went about getting the list. I have a csv file of every NBA player dating back to 1980 and an indicator of whether they were an All Star on a particular season. So right there, we know what qualities a player must have to be a) Good enough for the NBA b) Good enough to be an All Star. I also now have a big csv of every college player dating back to 1992 and their playing statistics (it’s not as rich as the NBA list, but it’s serviceable.) It turns out, most NBA guys spend at least a year in college, so I can make the connection to say things like “Magic Johnson averaged x points and y assists in college and he turned out to be an NBA quality player and an All Star.” So with that logic, I can point my algorithm at the college player list and say “hey, based on this year’s NCAA class, who has the best stats to be considered an NBA quality player and who can be an All Star?
I looked at a bunch of draft projection boards and put together a list of 100+ players on the radar. Now, I wasn’t really going for accuracy, but the above screen shot gives a list of players the algorithm predicted are NBA quality players. All but 5 of the selections appeared in the top 30 or top 100 of the composite draft projection boards. If they appeared in the top 30, I marked the predicted player as “1st round” and if they appeared in the top 60–100 I marked the player as “2nd round”.
So, what’s with the 5 players highlighted in yellow? I have no idea. These are guys the computer said “hey, they’re good enough!” and I didn’t see their names appear on any of the draft boards I checked out. So that means they’re either a complete miss by the computer “whiff” or I’m going to make some NBA team very happy by giving them great intel on a guy completely under the radar.
I ran another model on the predicted players, but this time it was a regression model. The closer the player is to 1.0, the more likely chance they have of being an NBA quality player. I kind of think of this as a confidence factor, but it’s not really one in a true statistical sense. I was just trying to quantify the likelihood of each player’s success.
How good of a chance does my player have based on their projected draft position? Well, 1–5 is a pretty good chance! But you all probably knew that. The key to a draft is “stealing” a player. Picking a great player who is projected to be drafted lower. *cough*jazz and rudy gobert at #27*cough*
Perhaps this year’s draft class steals are going to be Miles Bridges and Jawun Evans?