Graphing NFL Running Back Production by Age using Python (Part 1)

5 min readAug 8, 2021

Since NFL fantasy football drafts are right around the corner, I wanted to explore what age window exists for peak fantasy running back production.

We will be using the python programming language.

What to expect

By the end of this article you will have a set of functions to scrape and filter data to make your own data visualizations. The question that we are asking: “How does age affect fantasy production?” seems quite intuitive, the older players get, the more their fantasy value diminishes. The goal here is to get the reader to feel more empowered to ask questions using python as tool. Our final product will look like this and will reflect statistical output over the past ten years:

This article is broken into two parts:

Part 1

Scrape data

We will be scraping data from Pro Football Reference.

Part 2

Graphing NFL Running Back Production by Age using Python (Part 2)

Since NFL fantasy football drafts are right around the corner, I wanted to explore what age window exists for peak…

medium.com

Scrape Data

To start import packages

We are interested in the past ten years of running back production for fantasy football. We will need both rushing and receiving tables to assess both dimensions of a running backs production. Since we need to scrape the same tables over again we are going to write functions. We will also use a dictionary to account for formatting changes in the Pro Football Reference tables.

To start let’s draft our function:

Let’s get some html!

We have the shell of our function written out. Let’s start with getting some html. We will declare RB as the default value for position and run the function once to test if html is returned to us. Some things to note: we created a dictionary to fill out the last variable in the requests call.

pos ={"RB":['/rushing.htm']}html = requests.get(base_url + year + pos[position][0]).text

Some formatting will change across pages and our dictionary will prove helpful.

Testing our code

----- Results: -----

  Games Rushing 

---- End Results ---

Success!

Let’s add another value to our position dictionary to gather receiving stats. We will create a new variable called RR to stand for Runner-Receiving and create a new entry in our dictionary to call that web-page.

Testing our code

----- Results: -----

 Rk Player Tm Age Pos G GS Tgt Rec Ctch% Yds Y/R TD 1D Lng Y/Tgt R/G Y/G Fmb 

---- End Results ---

Column headers

We notice that our two calls of our function yields two different results. The first column is also a blank value that is not obviously visible in our results. If we examine our two pages we notice that the rushing page has two column headers. That means these two tables are structured differently. To get the appropriate column header we will add column index values to our dictionary.

pos ={"RB":['/rushing.htm', 1],
      "RR":['/receiving.htm', 0]}

We will add the following line of code to gather the column headers:

cols = [i.getText() for i in rows[pos[position][1]].findAll('th')]

We will scrape rushing and receiving to make sure our column headers are the both statistical categories. We will be returning a list of column headers cols instead of html text.

Our function now looks like this:

Testing our function

----- Results: -----

Rk Player Tm Age Pos G GS Att Yds TD 1D Lng Y/A Y/G Fmb
Rk Player Tm Age Pos G GS Tgt Rec Ctch% Yds Y/R TD 1D Lng Y/Tgt R/G Y/G Fmb

---- End Results ---

Success!!

We are now going to turn to gathering some stats. We start with initializing an empty list called stats=[]. A new value was added to the position dictionary and will be used to mark the beginning of the stats rows.

pos ={"RB":['/rushing.htm', 1, 2],
      "RR":['/receiving.htm', 0, 1]}stat_rows = rows[pos[position][2]:]

Then we will iterate through the rows and add text from the td tags using a list comprehension.

Testing our function

----- Results: -----

Derrick Henry *+ TEN 26 RB 16 16 378 2027 17 98 94 5.4 126.7 3
Dalvin Cook* MIN 25 RB 14 14 312 1557 16 91 70 5.0 111.2 5
Josh Jacobs* LVR 22 RB 15 15 273 1065 12 61 28 3.9 71.0 2
David Montgomery CHI 23 RB 15 14 247 1070 8 59 80 4.3 71.3 1
Ezekiel Elliott DAL 25 RB 15 15 244 979 6 62 31 4.0 65.3 6


Stefon Diggs*+ BUF 27 WR 16 15 166 127 76.5% 1535 12.1 8 73 55 9.2 7.9 95.9 0
Davante Adams*+ GNB 28 WR 14 14 149 115 77.2% 1374 11.9 18 73 56 9.2 8.2 98.1 1
DeAndre Hopkins* ARI 28 WR 16 16 160 115 71.9% 1407 12.2 6 75 60 8.8 7.2 87.9 3
Darren Waller* LVR 28 TE 16 15 145 107 73.8% 1196 11.2 9 69 38 8.2 6.7 74.8 2
Travis Kelce*+ KAN 31 TE 15 15 145 105 72.4% 1416 13.5 11 79 45 9.8 7.0 94.4 1

---- End Results ---

The function successfully grabbed the receiving table but we are only interested in Running Back receiving stats. We also want to return our results as a pandas dataframe. New values will be added to our data dictionary and we will filter out the values that we want when we create the dataframe.

pos ={"RB":['/rushing.htm',   1, 2, ["RB", "rb",""]],
      "RR":['/receiving.htm', 0, 1, ["RB", "rb",""]]}

It is noted that the table for rushing stats has three values in the pos column on pro-football-reference.com. Our new lines of code creating a dataframe and filtering the appropriate values out of the results looks like this:

season_stats = pd.DataFrame(stats, columns=cols[1:])
season_stats = season_stats[season_stats["Pos"].isin(pos[position][3])]

Our function now looks like this:

Testing our function

----- Results: -----

             Player   Tm Age Pos   G  GS  Att   Yds  TD  1D
0  Derrick Henry *+  TEN  26  RB  16  16  378  2027  17  98
1      Dalvin Cook*  MIN  25  RB  14  14  312  1557  16  91
2      Josh Jacobs*  LVR  22  RB  15  15  273  1065  12  61
3  David Montgomery  CHI  23  RB  15  14  247  1070   8  59
4   Ezekiel Elliott  DAL  25  RB  15  15  244   979   6  62


            Player   Tm Age Pos   G  GS  Tgt Rec  Ctch%  Yds
18  Alvin Kamara *  NOR  25  rb  15  10  107  83  77.6%  756
22   J.D. McKissic  WAS  27  rb  16   7  110  80  72.7%  589
41    Nyheim Hines  IND  24  rb  16   2   76  63  82.9%  482
51     Mike Davis   CAR  27  RB  15  12   70  59  84.3%  373
63   Austin Ekeler  LAC  25  rb  10  10   65  54  83.1%  403

---- End Results ---

Success!!

Now that we have built our function it is time to gather and filter some data, and visualize our results.

Graphing NFL Running Back Production by Age using Python (Part 2)

Since NFL fantasy football drafts are right around the corner, I wanted to explore what age window exists for peak…

medium.com

Graphing NFL Running Back Production by Age using Python (Part 1)

What to expect

Part 1

Part 2

Graphing NFL Running Back Production by Age using Python (Part 2)

Since NFL fantasy football drafts are right around the corner, I wanted to explore what age window exists for peak…

Scrape Data

Let’s get some html!

Testing our code

Success!

Testing our code

Column headers

Testing our function

Success!!

Testing our function

Testing our function

Success!!

Graphing NFL Running Back Production by Age using Python (Part 2)

Since NFL fantasy football drafts are right around the corner, I wanted to explore what age window exists for peak…

Michael Smith is Writing about what interests me. Culture, data sci, and philosophy.

Support from you allows me to write more articles. Thanks for reading.

Written by Michael Wayne Smith