Machine Learning in Sports Analytics

Professional sports are a very funny thing. Very few of us are involved in them, yet we all seem to believe that WE could be the next General Managers and not make any egregious mistakes. NBA pundits such as Stephen A. Smith and Skip Bayless spend an absurd amount of time talking about statistics such as PER, ,Pts/36, PPG, RPG, APG,win shares and VORP. It is due noting that a lot of these pundits don’t understand these parameters or what values they bring. For example, many will point to a college players points as a predictor of his performance in the NBA. Unfortunately, this is very poorly correlated. There are plenty of examples of players who performed extremely “poorly” one year and showed drastic improvement the next. Statistics as predictors are very complicated, yet they are used very frequently in sports.

The roots of this idea began in Baseball and are known today as Sabermetrics. This idea was popularized by the movie Moneyball, and it is a known fact that many teams use statistics to measure player’s success. Less known are the ideas presented at the MIT Sloan Conference. This is considered to be the apex of cutting-edge sports statistics. Recently, many researchers(for the most part sports enthusiasts) have begun applying machine learning. Applications such as player growth modeling, win predictions, draft evaluations are all extremely common.

While we always hear the talking heads preach about “intangibles”, many of our favorite players have been drafted in part because a computer program values their contributions. Players such as Hassan Whiteside have been given chances to play in the NBA in part due to predictive models valuing their limited output as a predictor of something greater.

Many different methods and machine learning algorithms have been used to analyze many aspects of the game. Neural Networks have been used to predict games about as accurately as experts[1]. Linear Regression has been trained to recognize players with a high likelihood to grow into successful players[2]. The NBA has recently provided SportsVU tracking data to further research. This has led to innovations, with one paper even using classifier training to train an algorithm to recognize the type of screen defense a team is more likely to employ and how successful they are at that defense[3]. Each of these provides an advantage for the user and allows teams without star players to be competitive.

Even more encouraging is that we are at the cusp of such discoveries. ESPN’s “Great Analytics Ranking” demonstrate how even the most analytically averse General Managers have been embracing tracking data. However, there is still a lot of variance and as time advances teams will employ increasingly more complex Algorithms. With larger datasets it will interesting to see how research grows in this area. Perhaps, machine learning will be used to determine how successful certain players are in certain match ups. Maybe European soccer will finally begin using neural networks to scout players from smaller leagues.