NFL 2020 Preview with Python: Quarterback

S. I.
S. I.
Sep 6, 2020 · 7 min read

NFL 2020 season is coming soon. For preview this season, I’m going to visualize some quarterbacks data using 2019 dataset.

1. Overview

In this article, I’m going to use this dataset as below. Thanks to Mr. Ron Yurko.

There is play-by-play dataset of pre-season, regular season and play-off. I’m going to use only regular season and visualize some quarterback stats. What kind of type? Pocket passer or Mobile QB? How is their performance? How is it when they are in the specific situation such as quarter, down and score behind?

OK, Let’s get down to implementation.

2. Preprocessing

Filter with quarterbacks.

Image for post
Image for post
qb.head()

See the dataframe info of pbp dataset.

<class ‘pandas.core.frame.DataFrame’> RangeIndex: 45546 entries, 0 to 45545 Columns: 256 entries, play_id to defensive_extra_point_conv dtypes: float64(130), int64(21), object(105) memory usage: 89.0+ MB

It’s too large to visualize quarterback data, so narrow down.

Aggregate this data as passing stats.

#Aggregate by player, quarter and down
qb_pass_stats = pbp_custom[
(pbp_custom.passer_player_id.isin(qb.gsis_id)) #only QB
& (pbp_custom.two_point_attempt == 0) #exclude two-point conversion
].groupby(
[
“passer_player_id”
,”qtr”
,”down”
]
,as_index=False
).agg(
{
“complete_pass”: “sum”
,”yards_gained”: “sum”
,”first_down_pass”: “sum”
,”pass_touchdown”: “sum”
,”incomplete_pass”: “sum”
,”sack”: “sum”
,”interception”: “sum”
}
)
#Create new columns
qb_pass_stats[“pass_attempt”] = qb_pass_stats[“complete_pass”] + qb_pass_stats[“incomplete_pass”] + qb_pass_stats[“interception”]
qb_pass_stats[“complete_rate”] = round(
qb_pass_stats[“complete_pass”] / qb_pass_stats[“pass_attempt”]
, 3
) * 100
#Aggregate by player
qb_pass_stats_season = qb_pass_stats.groupby(
[“passer_player_id”]
,as_index=False
).agg(
{
“pass_attempt”: “sum”
,“complete_pass”: “sum”
,”yards_gained”: “sum”
,”first_down_pass”: “sum”
,”pass_touchdown”: “sum”
,”incomplete_pass”: “sum”
,”sack”: “sum”
,”interception”: “sum”
}
)
#Create new columns
qb_pass_stats_season[“complete_rate”] = round(
qb_pass_stats_season[“complete_pass”] / qb_pass_stats_season[“pass_attempt”]
, 3
) * 100
#only who exceed 2000 yards
qb_pass_stats_season = qb_pass_stats_season[qb_pass_stats_season.yards_gained >= 2000]
Image for post
Image for post
qb_pass_stats[[“passer_player_id”, “qtr”, “down”, “pass_attempt”, “complete_pass”, “yards_gained”]].head()
Image for post
Image for post
qb_pass_stats_season[[“passer_player_id”,”pass_attempt”,”complete_pass”,”yards_gained”]].sort_values([“yards_gained”], ascending=False).head()

Top is Jameis Winston with 5109 yards.

Do the same with rushing. “yards_gained” doesn’t include lateral rush, please note that.

#Aggregate by player
qb_rush_stats_season = qb_rush_stats.groupby(
[
“rusher_player_id”
]
,as_index=False
).agg(
{
“rush_attempt”: “sum”
,”yards_gained”: “sum”
,”first_down_rush”: “sum”
,”rush_touchdown”: “sum”
}
)
Image for post
Image for post
qb_rush_stats[[“rusher_player_id”, “qtr”, “down”, “yards_gained”]].head()
Image for post
Image for post
qb_rush_stats_season[[“rusher_player_id”, “yards_gained”]].sort_values([“yards_gained”], ascending=False).head()

Top is of cource Lamar Jackson with 1206 yards.

Merge passing dataset and rushing dataset, also merge player dataset.

#Merge stats and players datasets
qb_stats_season = pd.merge(
qb_stats_season
,qb
,left_on="passer_player_id"
,right_on="gsis_id"
,how="inner"
)
qb_stats_season = qb_stats_season.rename(columns={"passer_player_id": "player_id"})#Create new columns
qb_stats_season["yards_gained"] = qb_stats_season["yards_gained_passing"] + qb_stats_season["yards_gained_rushing"]
qb_stats_season["touchdown"] = qb_stats_season["pass_touchdown"] + qb_stats_season["rush_touchdown"]
Image for post
Image for post
qb_stats_season[[“player_id”, “full_player_name”, “team”, “yards_gained”, “yards_gained_passing”, “yards_gained_rushing”]].head()

3. Visualization

Let’s visualize quarterback playing style. Describe passing yards and rushing yards using scatter plot.

with plt.rc_context(
{
"axes.edgecolor":"white"
,"xtick.color":"white"
, "ytick.color":"white"
, "figure.facecolor":"white"
}
):
fig = plt.figure(figsize=(15, 12), facecolor="black")
ax = fig.add_subplot(111, facecolor="black")
#Plot scatter
s = ax.scatter(
qb_stats_season["yards_gained_passing"]
,qb_stats_season["yards_gained_rushing"]
,s=200
,alpha=0.5
,c=(qb_stats_season["sack"] + qb_stats_season["interception"])
,cmap="bwr"
,marker="D"
)
ax.set_xlabel("Pass Yds", color="white")
ax.set_ylabel("Rush Yds", color="white")
ax.set_xlim(2400, 5200)
ax.set_ylim(-100, 1300)
#Plot player name as text
for _, qb_data in qb_stats_season.iterrows():
ax.text(
qb_data.yards_gained_passing
,qb_data.yards_gained_rushing
,qb_data.full_player_name
,verticalalignment="center"
,horizontalalignment="center"
,fontsize=13
,color="white"
)
#Colorbar settings
cb = plt.colorbar(s)
cb.set_label("Sack + Interception", color="white", size=20)
cb.outline.set_edgecolor("white")
plt.setp(plt.getp(cb.ax.axes, 'yticklabels'), color="white")
plt.title("QB Type", color="white")
Image for post
Image for post

X-axis is passing yards and Y-axis is rushing yards. It’s strange to be defined different scale between x-axis and y-axis, but this is for visibility.

I also colored each marker, which is total amount of sack and interception. Red, such as Winston and Murray, is more sacked and intercepted while blue, such as Mahomes and Brees, is less sacked and intercepted.

We can find out:

  • Winston has the highest passing yards but was more sacked and intercepted.
  • Jackson is absolutely mobile QB and was also less sacked and intercepted.
  • Mahomes and Brees was much less sacked and intercepted but not many passing yards.
  • Murray, Watson and Wilson is good at both?

Next, how many yards they gained while they were sacked or intercepted?

Calculate yards gained per sacked and intercepted and visualize it using histogram.

qb_stats_season = qb_stats_season.sort_values(“gained_per_sack_and_interception”, ascending=True).reset_index(drop=True)with plt.rc_context(
{
"axes.edgecolor":"white"
,"xtick.color":"white"
, "ytick.color":"white"
, "figure.facecolor":"white"
}
):
fig = plt.figure(figsize=(10, 10), facecolor=”black”)
ax = fig.add_subplot(111, facecolor=”black”)
#Plot horizontal histogram
ax.barh(
qb_stats_season.full_player_name
,qb_stats_season.gained_per_sack_and_interception
,color=”grey”
)
#Plot stats as text on histogram
for index, qb_data in qb_stats_season.iterrows():
ax.text(
qb_data.gained_per_sack_and_interception
,index
,str(qb_data.yards_gained) + “ / “ + str(int(qb_data.sack) + int(qb_data.interception))
,color=”white”
,ha=”center”
,va=”right”
)
plt.title(“Never Fail QB Ranks”, color=”white”)
ax.set_xlabel(“Gained / (Sack + Interception)”, color=”white”)
Image for post
Image for post

How stable Mahomes is. Brees, Prescott and Jackson are also outstanding. Meanwhile, Winston and Murray has many yards but we can say they are not stable.

By the way, how about each quarter? Aggregate data again.

qb_pass_stats_qtr[“complete_rate”] = round(qb_pass_stats_qtr[“complete_pass”] / qb_pass_stats_qtr[“pass_attempt”], 3) * 100qb_rush_stats_qtr = qb_rush_stats.groupby(
[
"rusher_player_id"
,"qtr"
]
,as_index=False
).agg(
{
"rush_attempt": "sum"
,"yards_gained": "sum"
,"first_down_rush": "sum"
,"rush_touchdown": "sum"
}
)
qb_stats_qtr = pd.merge(
qb_pass_stats_qtr
,qb_rush_stats_qtr
,left_on=["passer_player_id","qtr"]
,right_on=["rusher_player_id","qtr"]
,how="inner"
,suffixes=["_passing", "_rushing"]
)
qb_stats_qtr = pd.merge(
qb_stats_qtr
,qb
,left_on="passer_player_id"
,right_on="gsis_id"
,how="inner"
)
qb_stats_qtr["yards_gained"] = qb_stats_qtr["yards_gained_passing"] + qb_stats_qtr["yards_gained_rushing"]qb_stats_qtr["touchdown"] = qb_stats_qtr["pass_touchdown"] + qb_stats_qtr["rush_touchdown"]qb_stats_qtr = qb_stats_qtr.rename(columns={"passer_player_id": "player_id"})
Image for post
Image for post
qb_stats_qtr[[“player_id”, “full_player_name”, “team”, “qtr”, “yards_gained”, “yards_gained_passing”, “yards_gained_rushing”]].head()
with plt.rc_context(
{
"axes.edgecolor":"white"
,"xtick.color":"white"
, "ytick.color":"white"
, "figure.facecolor":"white"
}
):
fig = plt.figure(figsize=(15, 5), facecolor=”black”)
ax = fig.add_subplot(111, facecolor=”black”)
s = ax.scatter(
qb_stats_4q.yards_gained_passing
,qb_stats_4q.yards_gained_rushing
,s=200
,alpha=0.5
,c=(qb_stats_4q.sack + qb_stats_4q.interception)
,cmap=”bwr”
,marker=”D”
)
ax.set_xlabel(“Pass Yds”, color=”white”)
ax.set_ylabel(“Rush Yds”, color=”white”)
for _, qb_data in qb_stats_4q.iterrows():
ax.text(
qb_data.yards_gained_passing
,qb_data.yards_gained_rushing
,qb_data.full_player_name
,verticalalignment=”center”
,horizontalalignment=”center”
,fontsize=13
,color=”white”
)
cb = plt.colorbar(s)
cb.set_label(“Sack + Interception”, color=”white”, size=20)
cb.outline.set_edgecolor(“white”)
plt.setp(plt.getp(cb.ax.axes, ‘yticklabels’), color=”white”)
plt.title(“QB Type in 4Q”, color=”white”)
Image for post
Image for post

Prescott and Mahomes are in constrast. Compare the gained yards in each quarter. We can also say that most QBs are less sacked and intercepted because of 4Q. (Winston and Mayfield are gambler?)

with plt.rc_context(
{
"axes.edgecolor":"white"
,"xtick.color":"white"
, "ytick.color":"white"
, "figure.facecolor":"white"
}
):
fig = plt.figure(figsize=(10, 5), facecolor=”black”)
ax_mahomes = fig.add_subplot(121, facecolor=”black”)
ax_prescott = fig.add_subplot(122, facecolor=”black”)
#Draw pie chart of Mahomes
wedges, _, _ = ax_mahomes.pie(
mahomes_stats_qtr.yards_gained
,labels=[“1Q”,”2Q”,”3Q”,”4Q”]
,textprops={“color”: “white”}
,wedgeprops={“linewidth”: 3}
,startangle=90
,counterclock=False
,autopct=”%1.1f%%”
)
ax_mahomes.text(
0, 0
,qb_stats_season[“yards_gained”][qb_stats_season.player_id == “00–0033873”].values[0]
,color=”white”
,ha=”center”
,va=”center”
,fontsize=20
)
plt.setp(wedges, width=0.2)
#Draw pie chart of Prescott
wedges, _, _ = ax_prescott.pie(
prescott_stats_qtr.yards_gained
,labels=[“1Q”,”2Q”,”3Q”,”4Q”]
,textprops={“color”: “white”}
,wedgeprops={“linewidth”: 3}
,startangle=90
,counterclock=False
,autopct=”%1.1f%%”
ax_prescott.text(
0, 0
,qb_stats_season[“yards_gained”][qb_stats_season.player_id == “00–0033077”].values[0]
,color=”white”
,ha=”center”
,va=”center”
,fontsize=20
)
plt.setp(wedges, width=0.2)
ax_mahomes.set_title(“Mahomes”, color=”white”)
ax_prescott.set_title(“Prescott”, color=”white”)
Image for post
Image for post

Can we describe Mahomes is “pre-emptive” QB and Prescott is “rising” QB?

In addition, how about when the team is in adversity (score behind)?

Image for post
Image for post
Image for post
Image for post

Oh, Mahomes is also outstanding in adversity… Prescott is too. Stafford is 3rd while he is 8th in gross and Garoppolo is 7th while 16th in gross. We can say they are strong in adversity.

I can do as much as I want, but leave off around here. Will Mahomes be MVP again with outstanding stability? Prescott will lead Dallas to Superbowl? How will Winston achieve at Saints alongside Brees? Can Murray and Mayfield improve stability and become the best QB in NFL?

Thank you for reading!!

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

S. I.

Written by

S. I.

I write about Python and AWS for beginner. I'd be delighted any comment.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

S. I.

Written by

S. I.

I write about Python and AWS for beginner. I'd be delighted any comment.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app