Exploratory Analysis of NFL Fantasy Football Data: Unveiling Trends and Insights

Published in

INST414: Data Science Techniques

6 min readFeb 8, 2024

Introduction

I have been an avid player in many different fantasy football leagues ever since I can remember. However, the last few years I have struggled to perform well. I decided that this assignment could be a great opportunity to perform a deep exploratory analysis on fantasy football data in the hopes that it will help me win a championship next season. For those unfamiliar with fantasy football, it is a game you play with friends and/or family during the course of the National Football League (NFL) Season. Everyone drafts players from NFL teams and then during each week of the season, they face off against other teams. Each player gets a certain amount of points each week dependent on their stat line from their game that week. The highest total score user team gets a win for the week and the other team gets a loss. At the end of the season, the best teams make the playoff and compete for the championship.

In the world of fantasy football, the player draft before the season is the most pivotal part of the season. Obviously, the person who drafts the best and finds the best undervalued players is the person who in most cases is going to win the league. Therefore, the question I am going to be asking is:

“How do average draft positions (ADP) correlate with fantasy points scored by players and is there a clear pattern?”

The dataset I am using comes from kaggle: https://www.kaggle.com/datasets/robertcurrie/nfl-adp-and-fantasy-pts-fantasy-pros-2020-2022

Within this dataset, they have data from the 2020, 2021 and 2022 NFL seasons. For this first assignment, I am going to focus on the 2022 data, since during that season I had a losing record in all three of my fantasy leagues.

Stakeholder and Decision Context

The stakeholders invested are fantasy football enthusiasts, analysts and team managers, especially those gearing up for draft seasons. Basically, the stakeholders are people like me, looking to improve their drafting ability for fantasy football. Answering this question will provide insights into different draft strategies, player evaluations and roster compositions. The answer directly informs decisions regarding player selection based on past season success, position and their current NFL team. Other topics that should be considered are the draft order and NFL team dynamics which do ultimately impact the success and failure of fantasy football teams throughout the season. For example, another question to consider is do players who play on NFL teams with a winning record generally perform better than players on teams with a losing record? As we analyze the data, stakeholders will want to see that I have at least considered as many factors as possible that can affect player performance.

Data Description and Relevance

As mentioned before, the dataset is from Kaggle and is titled “NFL ADP and Fantasy Pts — Fantasy Pros 2020–2022.” The data you can find is player name, position, team, average draft position (ADP), and fantasy points scored across multiple seasons. This dataset perfect for uncovering correlations between ADP and fantasy performance as it provides a lot of data for both categories. In python, we can view, clean and organize the data enabling us to find patterns and trends that can significantly influence drafting decisions and team outcomes.

Data Collection and Exploration

To begin the analysis, I utilized the pandas library in Python to import and manipulate the dataset. Below is a the code used to load the dataset and perform initial operations:

In the above code, I loaded the two datasets for the data from the 2022 NFL season. Here is the output from the print statements:

As you can see, there is some data cleaning that needed to be done before I could do my analysis.

Data Cleanup and Preprocessing

During the exploration, I encountered a few common data issues but the most significant one was in the adp data, there were three columns with no values. Therefore, these columns are useless to use and will not be needed when analyzing the data. To address these rows, I implemented data cleaning procedures that removed these rows temporarily in python. Also, in the ADP data, I added a column that has the position label for each player, without the corresponding number rank. This will help me later during the data analysis. These changes do not affect the actual dataset that I have saved on my machine. Here is the code:

Here is the output of the print lines after the data clean:

Exploratory Data Analysis and Insights

For the primary analysis, I split the data frames into smaller data frames by position. Here is the code that I created to do so:

The print statements are pretty long, so if you want to see them, there is a link to the github page at the bottom of this post. From looking at the data, it became very clear that a player’s team performance had a lot to do with their ending ranking. For example, Jonathan Taylor was the first running back according to the ADP. However, he did not end up in the top five for the weekly dataset. The team he plays on, the Colts, had a bad season. We also see this trend from the number one wide receiver, Cooper Kupp, the number three tight end, Kyle Pitts, the number five quarterback, Kyler Murray, and many more. The data suggests that players who are on a good team perform better than others. Therefore stakeholders should decide which teams they think will be good and try to limit their player selections to those teams.

Limitations and Considerations

Despite the insights garnered on drafting players based on their team, there are a few limitations to acknowledge. Firstly, when looking at the dataset in particular, the dataset spans only three seasons, limiting the range of findings across longer time frames. Additionally, in this analysis, I only used the data on one of the three seasons provided. Next, external factors such as injuries, team dynamics, and coaching strategies are not shown through the statistics given by the data. Finally, while the dataset provides a comprehensive overview, it lacks certain player attributes and game context, thereby limiting the depth of analysis.

Conclusion

In conclusion, the exploratory analysis of NFL fantasy football dataset offers valuable insights into the relationship between fantasy points scored by players and the strength of their own football team. The data suggested that stakeholders should try and draft players from successful NFL franchises. By discovering this correlation, fantasy football team managers gain actionable intelligence to improve drafting strategies, optimize player selections, and ultimately elevate their teams performance.

GitHub Repository

Here is the link to the GitHub repository: https://github.com/DrossTheBoss/INST414-ModuleAssignment1