Fantasy Basketball Lineup Generator

Avani Gupta
Analytics Vidhya
Published in
7 min readApr 27, 2020

The goal of this analysis is to create a model to predict the best fantasy lineup for a certain game using only the statistics of the games previously played in the season as inputs. Whilst the logic for the model is created by analyzing various season progressions, the model itself can only use player histories from the start of the season in which the game to be predicted is in.

Full code here.

Project Parameters

Each lineup consists of 8 players: 2 guards, 2 forwards/centers, 2 utility players (guards, forwards, or centers), and 2 bench players (players with no start position but have played more than 0 minutes in the season to date).

The aim is to select a lineup that optimizes fantasy points. For each run of the model, there is a random scoring matrix instantiated that appends a certain number of points to a selected number of player statistics. There are 18 potential statistics than can be incorporated into the scoring matrix, such as rebounds (‘reb’), points (‘pts’), blocks (‘blk’), and so on.

Given that the system starts with no historical data and continues to learn from game-to-game statistics, it seemingly does not make sense to create a complex model, whether that be parametric or non-parametric. If any form of even a regression was used, the model would most likely output uninformed results for the first half of the season, given the minimal data input. Additionally, it would involve game-by-game tuning of the parameters, which goes against the intent of drop-in usability — the model should be able to perform at any point in the season with any team.

There are two final models created: one, team-specific (get the best lineup for a single team); two, league specific (get the best lineup across all players in the league).

Exploratory Analysis

To initially explore the data, I will be reviewing the 2017 NBA Season.

Position-Specific Stat Contribution

I’m curious to see whether there are stark differences between the best performing statistic per position. As seen below, the balance of points across the stats for each player seems to be in line, with the exception of rebounds for centers.

Average Stat Differentiation Between Positions

Seasonality Trends

It seems necessary to look into seasonality trends across stats and overall points for all teams. My goal is to see if there are standardized spikes for all teams at the same points during the season, as that could inform some sort of weightage moving forward. As seen in the two plots below, teams seem generally even in the minor-stats across the board, with spikes in the overall points coming at different points in the season.

Seasonal Progession Fluctuations Per Stat Per Team
Seasonal Progession Fluctuations For Overall Points Per Team

Point Aggregation

My initial assumption was that the simplest way to predict each player’s performance per stat would be to calculate their average per min in that statistic to that point and multiply it by their average minutes played. While this would empirically make sense, there are countless decisions and factors that influence a player’s performance, in addition to their performance either getting better or worse as the season progresses.

With this in mind, I performed an analysis to compare a range of rolling window averages to the player’s performance in each subsequent game. The method takes in a rolling window from 1–20, calculates the player’s average using said window (computes the average of the last x games; if less than x games have been played, computes the average of all previous), and calculates the mean squared error for all players using that window compared to their true scores. Below, I have tested this out on just those in the guard position.

As seen above, there seems to be a steep drop in many stats between window sizes of 2 and 5. Though there are over 70 games played in a season, I wanted to keep the window values low as to compare to a full running average in scores that will be conducted further on into this analysis.

Given that the error does decrease with a specific window, I wanted to find the minimum squared error per stat per position and use that specific window when creating the model. As this is computationally intensive, I found the optimized windows for minutes played and total points only. As seen in a post-run summary below, for all positions between the two stats, the ranges per team fell around window=5.

MSE of Increasing Rolling Window Average Prediction of Stat Per Min

Model Creation

For the modeling portion, I will be creating two systems: one that uses running overall averages, and one that uses rolling window averages.

Each model consists of the following methods:

1. Data Filtering

  • As the season is already instantiated in the data import section, this portion further cleans the data to filter by team (for the team-specific model)
  • Bench designations are input for those players who have played more than 0 minutes but have no start position

2. History Scoring

If this is not the first game of the season:

  • The dataset is filtered to include all data from before the designated game to predict for
  • Each statistic listed in the scoring matrix is converted to stat per min within the historical data
  • Average (total or rolling) for each of those statistics, in addition to minutes played, are then calculated and returned per player per position
  • Each statistic listed in the scoring matrix is then multiplied by the output average minutes as the predicted points per stat in the upcoming game
  • Each predicted statistic is then multiplied by the points designated to said statistic within the scoring matrix
  • Per each player and position, total fantasy points as designated by the scoring matrix are returned

If this is the first game of the season:

  • Total fantasy points per player = 0
Code snippet of scoring using rolling window averages.

3. Lineup Creation

  • Using the total points from the History Scoring section, the two highest point-getters per position category are added to the lineup
  • Running list of player names created to not include the same player in a separate position

4. Score Prediction

  • Using only the names and positions in the lineup, the data is filtered on the real points scored in the game to predict for
  • Each player’s stat is multiplied by its fantasy point designation, and all points are summed for total fantasy points

5. Real Optimal Lineup

  • Used for model testing
  • Find the true optimal lineup from the game to predict for; return fantasy points

Team-Specific Model Comparison & Evaluation

To compare the two models, predicted score outputs are plotted below for the first 70 games of the 2017 season with 9 different teams. Additionally, the real optimized lineup output score is included to compare.

Average, Rolling Average, and True Optimized Lineup Fantasy Points for Selected Teams for Games 1–70 of the 2018 Season

As seen above in the 9 selected teams for the 2017 season, the rolling model does generally outperform the average model. While in some instances the average model does to slightly better than the rolling (around game 40 in Houston), the rolling model overall does much better, especially with certain teams (e.g. Cleveland). Additionally, the rolling model seems quite in line with the real best-optimized lineup, often taking the same shape, and periodically resulting in equal point totals.

We can review the same results on the 2018 season with a different scoring matrix.

Average, Rolling Average, and True Optimized Lineup Fantasy Points for Selected Teams for Games 1–70 of the 2018 Season

Given the results of both models as compared to the true optimized lineup scores, proceeding with the rolling average model seems like the best course.

League-Specific Model Comparison & Evaluation

For full-league lineup optimization, the same logic stands as was listed in the Model Creation section. However, the data is no longer filtered at a team level, and a rank column is input for each team to show where they are in the season. The rank column stands as an equalizer across the league to incorporate the same amount of season-progression per player.

games_details['rank']=games_details.groupby("TEAM_ABBREVIATION")["GAME_ID"].rank("dense")

Based on the performance of the rolling average model over the running average model on a team level, only the rolling average model is used when viewing the league overall. Full code for this method can be found here in the ‘League Specific Model’ section.

Rolling Average and True Optimized Lineup Fantasy Points Across League for Games 1–70 of the 2017 Season

Further Work & Limitations

There is no doubt this analysis would be better served with a greater historical understanding of the game, the coaching and real-time lineup decisions made, and each individual position’s strengths and weaknesses. Given this, there are certainly additional optimizations that could improve the model.

In continuing this analysis, it would be prudent to test all seasons, teams, and positions, and see if there is any change in window range predictions. Additionally, performing much deeper research on other algorithms and less-complex models used in other fantasy sports could be beneficial in allowing a more exact prediction without having to retrain and test a model after each game.

--

--