Graphing NFL Running Back Production by Age using Python (Part 1)

Michael Wayne Smith
5 min readAug 8, 2021
Photo by Dave Adamson on Unsplash

Since NFL fantasy football drafts are right around the corner, I wanted to explore what age window exists for peak fantasy running back production.

We will be using the python programming language.

What to expect

By the end of this article you will have a set of functions to scrape and filter data to make your own data visualizations. The question that we are asking: “How does age affect fantasy production?” seems quite intuitive, the older players get, the more their fantasy value diminishes. The goal here is to get the reader to feel more empowered to ask questions using python as tool. Our final product will look like this and will reflect statistical output over the past ten years:

This article is broken into two parts:

Part 1

Scrape data

We will be scraping data from Pro Football Reference.

Part 2

Scrape Data

To start import packages

We are interested in the past ten years of running back production for fantasy football. We will need both rushing and receiving tables to assess both dimensions of a running backs production. Since we need to scrape the same tables over again we are going to write functions. We will also use a dictionary to account for formatting changes in the Pro Football Reference tables.

To start let’s draft our function:

Let’s get some html!

We have the shell of our function written out. Let’s start with getting some html. We will declare RB as the default value for position and run the function once to test if html is returned to us. Some things to note: we created a dictionary to fill out the last variable in the requests call.

pos ={"RB":['/rushing.htm']}html = requests.get(base_url + year + pos[position][0]).text

Some formatting will change across pages and our dictionary will prove helpful.

Testing our code

----- Results: -----

Games Rushing

---- End Results ---

Success!

Let’s add another value to our position dictionary to gather receiving stats. We will create a new variable called RR to stand for Runner-Receiving and create a new entry in our dictionary to call that web-page.

Testing our code

----- Results: -----

Rk Player Tm Age Pos G GS Tgt Rec Ctch% Yds Y/R TD 1D Lng Y/Tgt R/G Y/G Fmb

---- End Results ---

Column headers

We notice that our two calls of our function yields two different results. The first column is also a blank value that is not obviously visible in our results. If we examine our two pages we notice that the rushing page has two column headers. That means these two tables are structured differently. To get the appropriate column header we will add column index values to our dictionary.

pos ={"RB":['/rushing.htm', 1],
"RR":['/receiving.htm', 0]}

We will add the following line of code to gather the column headers:

cols = [i.getText() for i in rows[pos[position][1]].findAll('th')]

We will scrape rushing and receiving to make sure our column headers are the both statistical categories. We will be returning a list of column headers cols instead of html text.

Our function now looks like this:

Testing our function

----- Results: -----

Rk Player Tm Age Pos G GS Att Yds TD 1D Lng Y/A Y/G Fmb
Rk Player Tm Age Pos G GS Tgt Rec Ctch% Yds Y/R TD 1D Lng Y/Tgt R/G Y/G Fmb

---- End Results ---

Success!!

We are now going to turn to gathering some stats. We start with initializing an empty list called stats=[]. A new value was added to the position dictionary and will be used to mark the beginning of the stats rows.

pos ={"RB":['/rushing.htm', 1, 2],
"RR":['/receiving.htm', 0, 1]}
stat_rows = rows[pos[position][2]:]

Then we will iterate through the rows and add text from the td tags using a list comprehension.

Testing our function

----- Results: -----

Derrick Henry *+ TEN 26 RB 16 16 378 2027 17 98 94 5.4 126.7 3
Dalvin Cook* MIN 25 RB 14 14 312 1557 16 91 70 5.0 111.2 5
Josh Jacobs* LVR 22 RB 15 15 273 1065 12 61 28 3.9 71.0 2
David Montgomery CHI 23 RB 15 14 247 1070 8 59 80 4.3 71.3 1
Ezekiel Elliott DAL 25 RB 15 15 244 979 6 62 31 4.0 65.3 6


Stefon Diggs*+ BUF 27 WR 16 15 166 127 76.5% 1535 12.1 8 73 55 9.2 7.9 95.9 0
Davante Adams*+ GNB 28 WR 14 14 149 115 77.2% 1374 11.9 18 73 56 9.2 8.2 98.1 1
DeAndre Hopkins* ARI 28 WR 16 16 160 115 71.9% 1407 12.2 6 75 60 8.8 7.2 87.9 3
Darren Waller* LVR 28 TE 16 15 145 107 73.8% 1196 11.2 9 69 38 8.2 6.7 74.8 2
Travis Kelce*+ KAN 31 TE 15 15 145 105 72.4% 1416 13.5 11 79 45 9.8 7.0 94.4 1

---- End Results ---

The function successfully grabbed the receiving table but we are only interested in Running Back receiving stats. We also want to return our results as a pandas dataframe. New values will be added to our data dictionary and we will filter out the values that we want when we create the dataframe.

pos ={"RB":['/rushing.htm',   1, 2, ["RB", "rb",""]],
"RR":['/receiving.htm', 0, 1, ["RB", "rb",""]]}

It is noted that the table for rushing stats has three values in the pos column on pro-football-reference.com. Our new lines of code creating a dataframe and filtering the appropriate values out of the results looks like this:

season_stats = pd.DataFrame(stats, columns=cols[1:])
season_stats = season_stats[season_stats["Pos"].isin(pos[position][3])]

Our function now looks like this:

Testing our function

----- Results: -----

Player Tm Age Pos G GS Att Yds TD 1D
0 Derrick Henry *+ TEN 26 RB 16 16 378 2027 17 98
1 Dalvin Cook* MIN 25 RB 14 14 312 1557 16 91
2 Josh Jacobs* LVR 22 RB 15 15 273 1065 12 61
3 David Montgomery CHI 23 RB 15 14 247 1070 8 59
4 Ezekiel Elliott DAL 25 RB 15 15 244 979 6 62


Player Tm Age Pos G GS Tgt Rec Ctch% Yds
18 Alvin Kamara * NOR 25 rb 15 10 107 83 77.6% 756
22 J.D. McKissic WAS 27 rb 16 7 110 80 72.7% 589
41 Nyheim Hines IND 24 rb 16 2 76 63 82.9% 482
51 Mike Davis CAR 27 RB 15 12 70 59 84.3% 373
63 Austin Ekeler LAC 25 rb 10 10 65 54 83.1% 403

---- End Results ---

Success!!

Now that we have built our function it is time to gather and filter some data, and visualize our results.

--

--