NFL Rookie Analysis by Snap Count — 2023

Elliott Bauer
INST414: Data Science Techniques
5 min readMar 11, 2024

For my second module assignment, I decided to analyze data on players in the National Football League (NFL). Football has been one of my passions since I was a little kid, so I thought examining data on tendencies that occur within the league could prove to be very interesting. I gathered my data from Lineups.com, a database that has an abundance of data surrounding a variety of sports. I extracted 3 CSV files from the website — offensive skill players (quarterbacks, running backs, wide receivers, and tight ends), defensive players (defensive linemen, linebackers, and defensive backs), and offensive linemen (tackles, guards, and centers). All three of these files pertained only to the 2023–2024 NFL season. For this assignment, we were responsible for using the networkx library in Python to effectively show connections between nodes, which represent an entity of some sort. In my case, my nodes represented players, and the edges connecting nodes to one another were created for players who are on the same team. My analysis can apply to a variety of different target audiences. For one, it is useful for the casual football-viewer in order to learn more about players who are up and coming in the league, as snap count often positively correlates with player performance. In other words, players who are widely known and have better statistics are usually ones dominating the snap count. It is important to note that snap count refers to the number, expressed as a percentage, of plays that a specific player was on the field for during the season. It does not take the whole game into account, only the side of the ball that a player applies to. In other words, an offensive players’ snap count percentage is not reduced when the team is on defense, hence some players having snap counts as high as 100%. Anyways, another group that could benefit from my analysis could be NFL Scouts. Scouts could use this to see what positional groups tend to make the largest impact, as this could help with their drafting strategies. If they notice that one position group is able to be relied upon for a much larger percentage than another group, then it could certainly modify how they go about scouting college players. Below, I have a basic visualization with no edges, only nodes. On the right side, there is a key that separates each position by color. Some positions are the same color, as they are similar to one another. If I were a scout, I would immediately notice the prominence and size of the brown, blue, and pink nodes. These nodes correspond with offensive line, wide receiver, and defensive back positions. It is apparent that players at these positions may play more consistently (as displayed by their larger nodes), and do not need as long of time to develop in the league. If I were a scout, I would relay this information to front office employees responsible for drafting, as this could provide insight that had not previously been considered. On the opposite end of the spectrum, there is not a ton of orange or purple, so it might push scouts away from the direction of running backs and linebackers.

Now, above we have a visualization with all of the connections between players. As mentioned above, the players who are on the same team will share a central node, which is the two or three letter abbreviation which represents the location their team plays. When looking at importance, it is crucial to look at nodes that have the most edges, as they act as a bridge between connections. In this case the Los Angeles Rams (central node ‘LAR’) have a high importance, as they have some of the highest counts of edges (10) coming out of their node. The Green Bay Packers (GB) and Las Vegas Raiders (LV) also had 10 edges, representing 10 rookies that made an impact for them this NFL season.

The main method of data cleaning I used was through the Pandas library. My initial data had a lot of unnecessary columns that I did not want to focus on, so I had to drop them in order to make my scope more focused. I also used functions such as value_counts (as seen above), sort_values, merge, and rename to make my tables more accessible for myself and viewers. There are definitely some limitations to my data and my analysis that is important to consider. For one, this does not include and special teams players who were drafted. This includes players who play kicker, punter, long snapper, or return specialist, as the snap count percentages for these players would be very skewed. Teams typically only have one of each of these positions on their rosters, so for most of these rookies the percentages would be pushing 100. Plus, special teams plays happen significantly less of the time than offense and defense, so it would skew it in that way as well. That being said, there were several rookies who had a significant influence on special teams this year, and their impact should not go unnoticed. Another limitation this should consider is injuries. Injuries are something that happen on a regular basis, and are hard to quantify via statistics. For example, Texans quarterback C.J. Stroud got injured in a few games this season, which made him miss some time. He was arguably the most impactful rookie this year, winning Offensive Rookie of the Year honors and leading his team to the second round of the playoffs. Yet, he only played 85% of offensive snaps. At first glance, he may appear less valuable than players who played in the 90 to 100% range, despite the fact that he surely had a larger impact in the grand scheme of the season. There are several instances of this, but it is nearly impossible to account for them all. I hope you have gained some insight on how snap count can show a lot about rookies in the National Football League, as represented by the networkx library in Python.

Below, I have attached a link to my GitHub repository.

https://github.com/elliottbauer99/INST414/blob/main/Module%202%20Assignment.ipynb

--

--