An Analysis of NFL Fantasy Points Data using Euclidean Distance with KNN

Eli Dross
INST414: Data Science Techniques
5 min readMar 27, 2024

In the world of data science, assessing similarity between data points is a fundamental technique with applications across various domains. In this post, I will use euclidean distance, slightly modified to use KNN since I found it easier to code using KNN. I will use euclidean distance to assess similarity between football players in the context of NFL fantasy points data. By employing methods from module 3, specifically doing euclidean distance, I aim to answer a specific question, provide insights and make informed decisions leveraging similarity analysis.

Identifying Questions and Stakeholders:

The question I am asking about the dataset is: How can we identify the ten most similar players to a given player based on performance patterns in NFL fantasy football? The stakeholders for this question are: Fantasy football analysts, fantasy football team managers and enthusiasts seeking strategic insights for player selection and team optimization. This analysis can help inform Player drafting strategies, team composition strategies, trade decisions and general fantasy football strategy adjustments.

Description of Data:

The data is the same data I used in the Module one assignment. The source is from Kaggle, specifically the NFL ADP and Fantasy Points dataset webpage. Within this dataset, they have data from the 2020, 2021 and 2022 NFL seasons. For this assignment, just like assignment one, I am going to focus on the 2022 data, since during that season I had a losing record in all three of my fantasy leagues. This data is relevant as it is crucial for assessing player performance, identifying trends, and making data-driven decisions in fantasy football management.

Data Collection:

I collected the data from Kaggle’s data archive, specifically the “weekly_points_data.csv” file within the PPR, 2022 folder. I used Python libraries such as pandas for data loading and preprocessing.

Similarity Measurement:

The features I used when computing the similarity metric were:

  • Position: The position the player plays
  • Team: The team the player is on
  • Weekly Points Scored: There are 18 columns, each having the exact amount of fantasy points that player scored in the week
  • Average point scored over the entire season per week: This is the mean score the player had over the 18 week season
  • The total number of points the player scored: This is all 18 weeks of points scored for the player added up.

The similarity metric I chose to use was euclidean distance. However, I did not do the code the same way as done in the inclass exercise. I decided to utilize the k-nearest neighbors (KNN) algorithm to identify the closest matches for each player based on their performance attributes. I did this as I felt it was an easier way to write the code and it was a better way to accurately predict the ten closest players to one player.

Top 10 Similar Entities:

The three players I decided to use are Patrick Mahomes II, Terry McLaurin and Rachaad White. For each selected query entity (e.g., Patrick Mahomes II, Terry McLaurin, Rachaad White), the top 10 most similar players were identified using the KNN algorithm. Here is the output for each player with their ten most similar players:

  • Patrick Mahomes II
  • Terry McLaurin
  • Rachaad White

Analysis and Insights:

When looking at the output of Patrick Mahomes, it is clear that the position is very valued in terms of similarity. Six of the ten similar players to Mahomes are the same position as him, QB. It is interesting to see that only one other player from his team, KC, is on the list. This happens because since Mahomes scored so many points, the only other player on his team that scored enough points to be similar to him was Travis Kelce. These two points indicate that according to euclidean distance, the total points scored and position is more important for similarity than the team a player is on.

If you look at the data for Terry McLaurin, you can see that very similar conclusions can be made. There are only two players on the list that play on his team, while there are six players on the list that are in the same position. Finally, the ten similar players to McLaurin have total point values that are relatively close to each other.

The data on Rachaad White shows that not every query will end with similar insights. In his list of ten similar players, there are only three other players at the same position, RB. However, his data does match the other two players we examined as his total points are relatively similar to the ten closest players, and there are zero players on his list from his team. The Rachaad white similarity data compared to Mahomes and McLaurin data suggests that the most important column for similarity is total points.

Findings Summary:

As shown above, python prints out the ten most similar players for the three chosen queries. If you want to look at a different player, you can get the code off of github and simply change the name of the player passed into the find_ten_closest function.

The most important thing to take away from the analysis done by euclidean distance is that the total point column is the most important factor in determining similarity between players.

Data Cleaning:

To make the data usable for euclidean distance, I replaced all “-” symbols and the “BYE” week in the dataset with a 0. I also filled all empty columns with a 0. I also created numerical values for the teams and positions, so that they could be used in the similarity comparison.

Limitations and Biases:

Despite the insights garnered on player similarity based on the weekly points data, there are a few limitations to acknowledge. Firstly, when looking at the dataset in particular, the dataset spans only three seasons, limiting the range of findings across longer time frames. Additionally, in this analysis, I only used the data on one of the three seasons provided. Next, external factors such as injuries, team dynamics, and coaching strategies are not shown through the statistics given by the data. Finally, while the dataset provides a comprehensive overview, it lacks certain player attributes and game context, thereby limiting the depth of analysis.

Conclusion:

In conclusion, players who score a similar amount of total fantasy points over the course of the season are considered to be the most similar to each other. Even though what team a player is on and what position they play can be important as well, our data suggested that they are not as important as someone may think.

GitHub Repository:

Here is the link to my GitHub Repository: https://github.com/DrossTheBoss/INST414-ModuleAssignment3

--

--