Impact of Age on NFL Player Performance: Data Selection (Part 2)

Caleb Smith
5 min readMar 17, 2022

--

In 1978, two decisions were made that would change professional football forever. The first was the expansion of the season from 14 to 16 games. Because of this change, stats become inflated. Five years prior, one of the most infamous players in NFL history became the first player to rush for over 2000 yards in a season. Although OJ Simpson is better known today as an alleged murderer and Twitter personality, he’s also a Hall of Fame running back who had arguably the best single season in NFL history. OJ remains the only player with over 2000 rushing yards in a 14 game season. Had he kept this pace over 16 games, he would still hold the single-season rushing record with a staggering 2,290 yards. As it stands, his 1973 season is still the 8th best of all time, behind 7 rushers who appeared in 16 games.

Cover of the October 8th, 1990 edition of Sports Illustrated. This was the last SI cover to feature OJ until his notorious murder trial in 1995

The second change to the game was more subtle, but arguably more important. The 1970’s were a decade dominated by defense. Teams throughout the 70’s managed fewer yards per game (295) than any other decade in modern NFL history. Fittingly, no team won more Super Bowls over this 10 year stretch than the Pittsburgh Steelers. The legendary “Steel Curtain” defense boasted five Hall of Famers, including cornerback Mel Blount. Blount was a physical defender who was able to manhandle smaller wide receivers. Before 1978, defenders like Blount were allowed to “chuck or bump” receivers all over the field as long as the ball had not yet been thrown. Because of how stagnant offenses had become, the league decided to change this. The “Mel Blount Rule”, as it would later be known, banned this type of contact beyond 5-yards. This rule change (along with several others) opened up the passing game and paved the way for the more explosive offenses we see today.

In Part 1, I claimed data could be used to predict the future. But the quality and relevance of that data is paramount. Say you wanted to predict how world leaders would fare against each other in a hypothetical, globally televised round of golf. If you used data collected by the North Korean government, you’d conclude that Kim Jong Il would lap the field. The problem is this data is not reliable. Anyone even slightly familiar with the game can safely assume he never actually shot 5 holes-in-one. Similarly, although we have excellent data on the price of the S&P 500, using it to predict the Dear Leader’s score wouldn’t make any sense. These are obvious examples, but every data set has its flaws. Properly identifying them is crucial to making accurate predictions.

Hall of Famer Mel Blount (#47) makes a tackle on Clint Sampson of the Denver Broncos (Getty Images)

When it comes to historical NFL data, arguably no source is more comprehensive, accurate, and accessible than Pro Football Reference. The site has stats going back to 1920, the first year of the league’s existence. This is both a blessing and a curse. Upon further examination, the only individual stat recorded in 1920 was scoring: specifically, Sid Nichols of the Rock Island Independents ran for a touchdown. That’s it. In fact, scoring was the only individual stat recorded until 1932. That season, the league’s leading passer threw for 639 yards, its leading rusher ran for 576, and its leading receiver caught 21 passes. Players like Bronco Nagurski and Dutch Clark were top 20 in all three categories. This was a game that could hardly be recognized as modern-day football.

In 1932, Dutch Clark led the Portsmouth Spartans in passing and rushing, and finished second on the team in receiving (image from Colorado College)

Because of this, using data from 1920 or 1932 to predict the impact of age on modern-day players would be akin to using the stock market to predict Kim Jong Il’s golf score. As time went on, stat recording gradually became more reliable and the game continued to evolve. However, there is no single year with perfect statistical records, and there is no season where the game didn’t change. Therefore, there is no perfect time to start analyzing data for this project. This will always be an issue in analytics: perfect data does not exist. The job of the data scientist is to make an educated decision as to which data to use. For this project, there are a number of “good” starting points, such as the first Super Bowl season of 1966, the AFL/NFL merger in 1970, or the NFL’s expansion to 32 teams in 2002. But because of the introduction of both the 16-game season and the “Mel Blount Rule”, I decided to go with 1978 as my first season of analysis.

Python code for this graph can be found here

Ideally, I would like to predict the impact of age on every current NFL player. However, after careful consideration, I determined this would be beyond the scope of this project. Even in 2022, individual statistics for offensive linemen are difficult to come by, let alone analyze. Defensive stats are easier to find, but certain crucial numbers like sacks and tackles weren’t officially kept or recorded until after 1978. Because of this, I decided to stick to analyzing the impact of aging on four groups of players: Quarterbacks, Running Backs, Receivers, and Kickers. All four of these positions had comprehensive statistical profiles by 1978. In Part 3, I will compare and contrast the effects of aging on these four groups, and start diving deeper into the analysis of this project.

--

--

Caleb Smith

Stats nerd with an interest in sports, politics, travel, and economics