Clustering the NFL’s Receivers

Yards After Catch and what it can tell you about the receiving hierarchy in the league

Published in

Web Mining [IS688, Spring 2021]

8 min readMay 3, 2021

There are a ton of statistics for NFL players that go well beyond the typical summary level statistics such as yards from scrimmage, touchdown total, and receiving total. For each position there are some advanced metrics that can tell a lot about a players ability that the normal stats simply ignore. For an NFL receiver, if you follow the sport, you probably know the name itself means they receiver or catch the pass from the QB. Each caught pass is a reception and the distance from the line of scrimmage (where the ball is at the beginning of the play) to the point the receiver is brought down counts as yardage for both the receiver and the quarterback. Generally, Wide Receivers, Tight Ends, and Running Backs can all play the receiving role.

Let me give you an example that might help explain how this works. The ball is on the 10 yard line. The play starts, the quarterback passes to the wide receiver who catches it at the 20 yard line, but he is finally tackled at the 30 yard line where the play ends. Despite where the wide receiver catches the ball, this is considered a 20 yard pass reception. But a lot can happen in those 10 yards between where the receiver catches the ball, and where they are finally brought down.

Introducing, Yards After Catch or YAC. This is a measurement of a players distance they travel once they catch the ball. This can tell you a lot about a player such as their speed, quickness, athletic ability, and even intelligence.

Example of what 10 Yards After Catch (YAC) looks like

Not every receiver is built the same, and not every receiver can do the same as the next. So let’s take a look at some of the YAC stats from 2020 and see if we can break down “types” of receivers using data.

The Data

For this post I am using the .csv file I downloaded off of Pro-Football Reference, specifically, the advanced stats for the receiving category. I cut out a lot of what’s here and kept the reception total and the Yards After Catch total.

The Tools

To develop the clustering that can tell us more about the breakdown of receivers, we use the following in Google Colab:

Setting up the analysis

Once the data is imported into my dataframe, I wanted to limit the amount of players to those who only saw regular action. I figured 30 receptions would be good because that would include some low pass volume starters at positions other than wide receiver, and weed out some of the players who just don’t that often or were injured.

df = df[df[‘Rec’] > 30]

This left us with 148 players to look at.

To look at the shape of our data, let’s create a scatter plot that maps out receptions on the x-axis and YAC on the y-axis

plt.scatter(df.Rec, df.YAC)plt.xlabel(‘Receptions’)plt.ylabel(‘Yards After Catch (YAC)’)

Looking at the data, it’s a bit difficult to eyeball clear clusters. But at this point, I would guess there are about two clusters that look something like this.

Leveraging Sklearn, we can set our K to 2 and fit each player to a cluster, at which point I will add a cluster column to our dataframe and print so you can see.

km = KMeans(n_clusters=2)
predicted = km.fit_predict(df[['Rec', 'YAC']])

Now that we have our players associated to clusters, we can replot our data in a way that is visually easy to pick apart.

As you can see, there is an awkward, horizontal axis splitting the two clusters. While technically clustered, I don’t think this tells us much except there are players with a relatively high YAC and relatively low YAC. So let’s adjust.

First thing we can do is scale the data using the MinMaxScaler function. This will basically fit the data into a range of 0 to 1. This should help with our awkward axis.

scaler = MinMaxScaler()scaler.fit(df[[‘Rec’]])df[‘Rec’] = scaler.transform(df[[‘Rec’]])scaler.fit(df[[‘YAC’]])df[‘YAC’] = scaler.transform(df[[‘YAC’]])

After re-running our predict cluster function, the result plot looks a lot better. In this particular chart, I also included the centroids for each cluster.

#To find your centroidskm = KMeans(n_clusters=2)km.cluster_centers_

Since I eyeballed our K (clusters), I want to make sure I’m doing this properly. One method to make sure I am using the proper amount of clusters is called the Elbow Method.

To do this in python I created a range that I thought our end amount of clusters would fall between. I also created an empty Sum of Squares Error array. Again, SKlearn makes this very easy utilizing the inertia method, inertia defined by their documentation as:

Sum of squared distances of samples to their closest cluster center.

r = range(1,10)sse = []for k in r:km = KMeans(n_clusters=k)km.fit(df[[‘Rec’, ‘YAC’]])sse.append(km.inertia_) #inertia finds the sum of square error, this appends it to the instantiated list

With the elbow method, we want to track this SSE to a point where further clusters will no longer benefit our graph.

We can plot the above SSE List in a graph to visually see the “elbow” where our K will be most efficient. In this particular case, you see the elbow around three, but the plot points tell me that the returns really start to diminish around 5. So going forward we will set our K to 5.

Now that we have the proper amount of clusters, we can adjust our dataframes and begin plotting each cluster.

Now we appear to have a graph that tells us something about the NFL’s group of receivers.

Breaking Down the Clusters

In red, we have our superstars. These are the leagues top performers in volume and/or YAC. No one here would be surprising to anyone who watches the NFL.

In blue, we have our secondary guys. These players are like the superstars, but they are not the primary on their team. They see a good amount of volume and often times turn it into a decent gain with their YAC. These are most likely wide receivers again, but can be mixed in with Tight End’s or Running backs.

Cyan is where this get’s interesting. These are low pass volume players who make the most out of the opportunities they get. These can include rookie wide receivers, maybe even slot receivers, but most likely Running Backs. Running backs generally have the skill to catch the ball and make tacklers miss, increasing their YAC. Running Backs that catch are becoming increasingly valuable in the NFL and this cluster might just show why. If you are a fantasy football player or a pro football coach, it may make sense to keep an eye on these guys as they can be stars (if they aren’t already). Although, keep in mind, some of these guys can be starting players who started the season injured or became injured.

In black, you have guys who get fairly low volume, and generally don’t do much after the catch. This is likely where a bulk of your Tight End will fall. They can make catches at a decent amount of volume, but are typically much slower than your average wide receiver or running back.

In green, again you have low volume, low YAC, this can be an indicator of receivers further down the depth chart or again, tight ends who get used in passing situations only a handful of times, or running backs who aren’t known for the their pass catching prowess.

To support this, I created a pivot table.

Limitations

Visually setting your own K is not always the way to go, as demonstrated above.
Scaling the data was an absolute must, prior to scaling my data my SSE was in the 10’s of thousands and requiring a considerable amount of clusters to really go down. That obviously would not have worked out very well. Scaling allowed me to properly calculate SSE and leverage the elbow method to find an efficient K.

Injured players can find themselves at almost any point in the chart. Clustering an injured superstar with the low tier players is a risk in this analysis.

Sources

2020 NFL Advanced Receiving | Pro-Football-Reference.com

2020 NFL Advanced Receiving

| Pro-Football-Reference.com 2020 NFL Advanced Receivingwww.pro-football-reference.com

sklearn.cluster.KMeans - scikit-learn 0.24.2 documentation

K-Means clustering. Read more in the . Parameters n_clustersint, default=8 The number of clusters to form as well as…

scikit-learn.org

sklearn.preprocessing.MinMaxScaler - scikit-learn 0.24.2 documentation

Edit description