Predicting Future Fantasy Football Stars by Casey Tabatabai

Casey Tabatabai
INST414: Data Science Techniques
10 min readMay 19, 2022

The problem that I aimed to solve during this project was the difficulty to predict which college football prospects will become fantasy football stars in the National Football League (NFL). The standard way that the majority of people use to address this problem is watching film of each prospect and forming their opinions by seeing how the players perform on the field. However, in this project I wanted to see if I was able to predict future NFL fantasy football success without watching a single minute of any of these prospects play. The stakeholders for this issue are people who play fantasy football. Millions of people play fantasy football around the world and are putting their hard earned money on the line for the sake of this hobby. This problem is important to them as fantasy football is a hobby that usually involves a lot of betting and bragging rights. People are willing to spend money in order to become more successful at performing in fantasy football. Already I am sure that the 32 NFL teams do some sort of data analysis similar to this when preparing for the NFL draft and constructing their draft boards. The NFL is one of the largest businesses in the world and if I can create a way to generate insights to predict future outcomes in this industry then it could be extremely lucrative.

The first main actionable insight that I intended to extract from my analysis was specific statistics that highlight the difference between college players that become NFL fantasy football stars and college players that do not. Every year there is an NFL draft where hundreds of college football prospects are drafted to NFL teams, but very few end up being successful in the NFL. In this project I wanted to find which statistics the future NFL stars had in college that separated them from the average players. For example, if the very successful fantasy football running backs averaged 60% more yards per season than the average running backs, then this would be a valuable insight as we will know to look specifically at high rushing yards to predict future success.

The second main actionable insight that I intended to extract from my analysis was a list of current college players that have similar statistics to the successful NFL fantasy players’ college statistics. In this project I wished to provide the reader with an idea of which current college players have a good shot at becoming future stars in fantasy football.

The data that I used during this project is from pro-football-reference.com and sports-reference.com. Pro-football-reference.com includes the data I need for the current NFL players, while sports-reference.com includes the college data. I used the Python libraries BeautifulSoup and Requests in Visual Studio Code to pull the data from both websites. The main information that I obtained from pro-football-reference.com was data for the highest performing fantasy quarterbacks, running backs, and wide receivers in the NFL from 2019–2021. The main information that I acquired from sports-reference.com was data for all college quarterbacks, running backs, and wide receivers from 2016–2021. Originally I wanted to include NFL data from 2016–2021, but found that there was limited college data from 2016–2021 for fantasy stars from 2016–2018, so I trimmed the NFL data to 2019–2021.

After completing the data collection portion for this project, I needed to focus on manipulating the data, creating a new statistic to calculate fantasy points for each player, and creating tiers for NFL players based on how many fantasy points they had. I used the scoring format from my personal fantasy football league to create the fantasy points statistic:

· .04 points per 1 passing yard

· 4 points per passing touchdown

· .1 point per 1 rushing or receiving yard

· 6 points per rushing or receiving touchdown

· -2 points per fumble or interception

After creating the fantasy points statistic I was able to sort each position by the amount of fantasy points a player got. To judge how successful a player was for fantasy football purposes I created three different tiers to see how many fantasy points a player produced in a season:

· Tier 1: Top 5 fantasy finish at their position

· Tier 2: Top 10 fantasy finish at their position

· Tier 3: Top 15 fantasy finish at their position

Next, I focused on using the data obtained from the tiers I created to acquire the college statistics from the tiered players. To do this I analyzed the names provided from each tier for every position. If a player finished in the top 5 fantasy points in their position over any year between 2019–2021, then their name was added to the tier 1 list. If a player’s best finish over the 3 years was top 10, then their name was added to the tier 2 list, etc.

Lastly, after obtaining the lists from each tier for every position, I was able to use the college data that I collected to acquire the college statistics of the successful NFL players. I web scraped the 2016–2021 data from sports-reference.com, the college alternative to pro-footballreference.com. After using many of the same techniques to collect the college data, I was able to use the data frame .isin method to only search for the college data for player names included in my tiered lists.

The first main insight that I wanted to extract from my dataset was specific statistics that consistently separate the college players that will become NFL fantasy football stars from the average players. The type of analysis that I used to find this information is exploratory analysis. For each position I looked at which statistics show the biggest difference between past college players that would go on to finish in the top 15 in their position in the NFL, and the average 2021 college players. I then created scatterplots using the statistics found in the previous step to see which current college players performed similarly to the successful NFL players’ college statistics.

Exploratory analysis for quarterbacks:

I found that the statistics that display the biggest difference for quarterbacks is rushing yards, rushing touchdowns, and passing touchdowns (Excluded average rush as it would be very similar to the other two statistics. Quarterbacks that would go on to have a top 15 fantasy finish in the NFL averaged 674 rushing yards, 10.1 rushing touchdowns and 33.4 passing touchdowns per year in college, while the average 2021 college players averaged 225 rushing yards, 4 rushing touchdowns, and 20 passing touchdowns. This suggests that it is crucial to target quarterbacks that can run the ball in fantasy football.

•Confident of future fantasy football tier 1 finish: Malik Willis, Sam Howell

• Possible future fantasy football tier 1 finish: Sam Hartman, Desmond Ridder

Exploratory analysis for running backs:

I found that the statistics that display the biggest difference for running backs is receiving touchdowns, receiving yards, and scrimmage touchdowns. Running backs that would go on to have a top 15 fantasy finish in the NFL averaged 2.5 receiving touchdowns, 308.8 receiving yards, and 16.8 scrimmage touchdowns per year in college, while the average 2021 college players averaged 0.6 receiving touchdowns, 103.6 receiving yards, and 7.3 scrimmage touchdowns. This suggests that it is very important to target running backs that have a role in the passing game as well.

• Confident of future fantasy football tier 1 finish: Deuce Vaughn, Breece Hall, TreVeyon Henderson

• Possible future fantasy football tier 1 finish: Bryant Koback, Bijan Robinson

Exploratory analysis for wide receivers:

I found that the statistics that display the biggest difference for wide receivers is receiving yards, receiving touchdowns, and scrimmage touchdowns. Wide receivers that would go on to have a top 15 fantasy finish in the NFL averaged 1,081 receiving yards, 10 receiving touchdowns, and 10 scrimmage touchdowns per year in college, while the average 2021 college players averaged 571 receiving yards, 4 receiving touchdowns, and 5 scrimmage touchdowns. This suggests that wide receivers can be a bit more tricky to predict as the players that have the highest receiving statistics tend to be the most successful, but this is a rather obvious insight.

• Confident of future fantasy football tier 1 finish: Jordan Addison, Jameson Williams

• Possible future fantasy football tier 1 finish: A.T. Perry, Mitchell Tinsley

The second main actionable insight that I wanted to extract from my analysis was a list of current college players that had similar statistics to the successful NFL fantasy players’ college statistics. The type of analysis that I used to find this information was clustering analysis. For this analysis I combined the DataFrames for 2021 college prospects and tier 1 fantasy football stars’ college statistics, which allowed me to see which clusters the future stars fell into. In this analysis I was able to see which current college players find themselves in a cluster with one or multiple future NFL fantasy stars. I believe by providing a list of similar players, readers will be able to keep an eye on certain prospects and target them in future fantasy football drafts. The cluster sizes for each position was determined using the elbow method.

Notable clusters for quarterbacks:

Bryce Young and Will Rogers were both found to have very similar college statistics compared to Patrick Mahomes’ college statistics, which suggest that they may have similar NFL fantasy football success as him.

Deshaun Watson and Lamar Jackson were both put in their own clusters, which suggests that targeting quarterbacks with unique college statistics could lead to success.

Notable clusters for running backs:

Tyler Badie was put in a cluster with Christian McCaffrey which is encouraging for his future NFL fantasy football outlook.

This cluster has three out of the six players being future NFL fantasy football stars, which suggests that the other three players may follow a similar path.

Similar to the quarterback clusters, many of the future NFL fantasy stars were put into their own clusters. Lew Nichols and Deuce Vaughn were both also put into their own clusters, which suggests that they may have similar success.

Notable clusters for wide receivers:

Deven Thomkins produced similar college statistics to current NFL star Ja’Marr Chase, which suggests that he could be a star as well. However, after looking at the statistics deeper, Thompkins had 18 more receptions with less receiving yards and 10 less touchdowns, so it is unlikely.

Three of the most promising college prospects found in through the exploratory analysis found themselves in a tier together. I believe this suggests that some of the best current college players are playing the game a bit differently, so it will be interesting to see if it translates to NFL stardom.

Justin Jefferson is currently one of the best wide receivers in the league, which suggests that these players are promising college wide receiver prospects.

The answers that I have for the stakeholders range from broad takeaways for fantasy football in general to specific players that should be targeted. When drafting in fantasy football I found that it is critical to target quarterbacks that rush frequently and running backs that are involved in the passing game. I found that Malik Willis, Sam Howell, Bryce Young and Will Rogers are the most likely college quarterback prospects to be future NFL fantasy football stars. Additionally, I found that Deuce Vaughn, Breece Hall, TreVeyon Henderson, Bijan Robinson, and Tyler Badie are the most likely college running back prospects to be future NFL fantasy football stars. Lastly, I found that Jordan Addison, Jameson Williams, and Jaxon Smith-Njigba are the most likely college prospects to be future NFL fantasy football stars.

One of the main limitations of my project is the fact that it draws conclusions of solely quantitative data. Fantasy football is a very fun hobby that sometimes makes people forget that NFL players are actual people with personal situations that can impact how successful they end up being in the NFL. For example, if a very promising player is drafted highly but has a run of very unfortunate events happen in their personal lives that impact how much time and focus they are putting into football, then this will impact their performance but the data wont account for this. Additionally, the team that a player is drafted to often drastically impacts how successful they end up being. If a player is drafted to the Detroit Lions or Washington Commanders it is much more likely that they will underperform compared to if they are drafted to the Pittsburgh Steelers or Baltimore Ravens, but the data cannot account for this.

Link to GitHub Repository

--

--