NFL 2020 Preview with Python: Rushing

shotin
Analytics Vidhya
Published in
5 min readSep 7, 2020

NFL 2020 season is coming soon. For preview this season, I’m going to visualize some rushing data using 2019 dataset.

Please see also my article about quaterback data visualization.

1. Overview

In this article, I’m going to use this dataset as below. Thanks to Mr. Ron Yurko.

There is play-by-play dataset of pre-season, regular season and play-off. I’m going to use only regular season and visualize some rusher stats. How many run yards did they have in total and on average? How is their performance when in the specific situation such as quarter, down and score behind.

OK, Let’s get down to implementation.

2. Preprocessing

import pandas as pd
pd.set_option(“max_columns”, 400)
pbp = pd.read_csv(“play_by_play_data/regular_season/reg_pbp_2019.csv”)
roster = pd.read_csv(“roster_data/regular_season/reg_roster_2019.csv”)

See the dataframe info of pbp dataset.

pbp.info()

<class ‘pandas.core.frame.DataFrame’> RangeIndex: 45546 entries, 0 to 45545 Columns: 256 entries, play_id to defensive_extra_point_conv dtypes: float64(130), int64(21), object(105) memory usage: 89.0+ MB

It’s too large to visualize rushing data, so narrow down the columns. Please note “yards_gained” doesn’t include lateral rush.

pbp_custom = pbp[
[
“game_id”
,”game_half”
,”qtr”
,”time”
,”posteam”
,”yardline_100"
,”down”
,”ydstogo”
,”yards_gained”
,”play_type”
,”two_point_attempt”
,”first_down_rush”
,”rush_attempt”
,”rush_touchdown”
,”rusher_player_id”
,”rusher_player_name”
,”score_differential”
]
].sort_values(
[
“game_id”
,”game_half”
,”qtr”
,”time”
]
,ascending=[
True
,True
,True
,False
]
)
pbp_custom

Aggregate season total rushing by player and also by player, quarter and down.

#Aggregate by player
rush_stats_season = pbp_custom[
(pbp_custom.two_point_attempt == 0)
& (pbp_custom.rush_attempt == 1)
].groupby(
[
“rusher_player_id”
,”rusher_player_name”
]
,as_index=False
).agg(
{
“rush_attempt”: “sum”
,”yards_gained”: “sum”
,”first_down_rush”: “sum”
,”rush_touchdown”: “sum”
}
)
# Only who over 1000 yards in season
rush_stats_season = rush_stats_season[
rush_stats_season.yards_gained >= 1000
].sort_values([“yards_gained”], ascending=False)
rush_stats_season
#Aggregate by player, quarter and down
rush_stats_details = pbp_custom[
(pbp_custom.two_point_attempt == 0)
& (pbp_custom.rush_attempt == 1)
& (pbp_custom.rusher_player_id.isin(
rush_stats_season.rusher_player_id
)) # Only who over 1000 yards in season
].groupby(
[
“rusher_player_id”
,”rusher_player_name”
,”qtr”
,”down”
]
,as_index=False
).agg(
{
“rush_attempt”: “sum”
,”yards_gained”: “sum”
,”first_down_rush”: “sum”
,”rush_touchdown”: “sum”
}
)
rush_stats_details

3. Visualization

Firstly, I visualize total yards gain ranking using histogram. Not only total yards, also visualize average yards per rushing attempt.

%matplotlib inline
import matplotlib.pyplot as plt
with plt.rc_context(
{
"axes.edgecolor":"white"
,"xtick.color":"white"
, "ytick.color":"white"
, "figure.facecolor":"white"
}
):
fig = plt.figure(figsize=(15, 8), facecolor=”black”)
ax_hist = fig.add_subplot(111, facecolor=”black”)
ax_line = ax_hist.twinx() #Share X-axis
#Plot histogram
ax_hist.bar(
rush_stats_season.rusher_player_name
,rush_stats_season.yards_gained
,color = “mediumseagreen”
,width=0.7
)
ax_hist.set_ylim(800, 1600)
ax_hist.set_ylabel(“Total Yards”, color=”white”)
#Plot line in the same axes with histogram
ax_line.plot(
rush_stats_season.rusher_player_name
,rush_stats_season.yards_per_attempt
,”chocolate”
,linewidth=3
)
ax_line.set_ylim(4, 7)
ax_line.set_ylabel(“Yards per Attempt”, color=”white”)

Top 5 players are all Running Back but can you see that Lamar Jackson has outstanding average yards? Almost all of the players in the ranking are 4–5 yards but he is nearly 7 (of course 4–5 is also outstanding). This means almost 1st down when he runs.

Next, from a different perspective, how about each quarter? I use heatmap in seaborn library.

Firstly, we need to create pivot table which has player name as index, quarter as column and gained yards as value. Gained yards is aggregated total.

rush_stats_qtr_pivot = pd.pivot_table(
data=rush_stats_details[rush_stats_details.qtr <= 4]
,values=”yards_gained”
,columns=”qtr”
,index=”rusher_player_name”
,aggfunc=”sum”
)
rush_stats_qtr_pivot

I visualize this data as heatmap. Set pivot table as data source and can display actual value using “annot” parameter.

import numpy as np
import seaborn as sns
with plt.rc_context(
{
"axes.edgecolor":"white"
,"xtick.color":"white"
, "ytick.color":"white"
, "figure.facecolor":"white"
}
):
plt.figure(figsize=(15, 8), facecolor=”black”)
sns.heatmap(
rush_stats_qtr_pivot
,annot=True #Display values
,fmt=”g”
,cmap=”Blues”
)
plt.xlabel(“Quarter”, color=”white”)

Derrick Henry and Christian McCaffrey looks strong in 3rd quarter. Henry, Carlos Hyde and Nick Chubb has more yards in 4th. Meanwhile, Ezeliel Elliott is strong in 1st half.

In addition, I want to see also each down. At this time, I’m going to use average yards not gross yards.

#Aggregate by player and down
rush_stats_down = rush_stats_details.groupby(
[
“rusher_player_name”
,”down”
]
,as_index=False
).agg(
{
“rush_attempt”: “sum”
,”yards_gained”: “sum”
}
)
rush_stats_down[“yards_per_attempt”] = round(rush_stats_down.yards_gained / rush_stats_down.rush_attempt, 1)rush_stats_down = rush_stats_down.astype({“down”: int})
rush_stats_down.head(10)

I create pivot table again.

rush_stats_down_pivot = pd.pivot_table(
data=rush_stats_down
,values=”yards_per_attempt”
,columns=”down”
,index=”rusher_player_name”
,aggfunc=”sum”
)
rush_stats_down_pivot

Visualize it.

with plt.rc_context(
{
"axes.edgecolor":"white"
,"xtick.color":"white"
, "ytick.color":"white"
, "figure.facecolor":"white"
}
):
plt.figure(figsize=(15, 8), facecolor=”black”)
sns.heatmap(
rush_stats_down_pivot
,annot=True
,fmt=”g”
,cmap=”Blues”
)
plt.xlabel(“Down”, color=”white”)

We can see many 5+ boxes in heatmap, but Jackson is outstanding. He always gains over 5 yards 1–3rd down (4th is almost 5), this means if he rushes twice Ravens gets 1st down. Why cannot we say he is the best rusher?

Henry and Josh Jacobs also gains almost 5 yards every down. Can we say these three rushers are the best in NFL?

In the end, extra visualization, I create same figure using dataset which has rushing play when offense team was score behind.

Pay attention to Chabb, Henry, Leonard Fournette and Kenyan Drake when they are in adversity? They never give up.

Thank you for reading!!

--

--