NFL Kickers Analysis

Published in

PRESS BOX

12 min readApr 26, 2022

A brief tour of the NFL data with Python

Welcome to the NFL kicker’s performance analysis! We’ll take a dive into the data from kickers from 2006 and 2020. Are the kickers getting better over the years? How does their performance change in high-pressure situations? At what age do kickers reach their peak? We’ll try to get some of these answers. The full project can be found in my github repository.

In this project, we’ll try to answer the followings:

What is the probability of scoring as a function of the kick distance? What’s the average accuracy at the “field goal range?”
What are the best and worst teams in the NFL kicking? And the best players?
Do kickers feel the pressure? Their performance is worse when they are playing away from home? What about in games between division rivals? And in crucial moments?
Are the kickers improving their performance over the years?
At what ages do the kickers hit their prime?

Ready? Let’s go!

Preparing and Cleaning the Data

The data was crafted from these Kaggle datasets [1][2]. The original data contains a full play-by-play report of all games in the NFL. It’s a very big folder with a lot of csv files that can generate a lot of exciting analyses for football fans.

After importing the data to our python notebook, we’ll make some transformations to create the dataset. The steps we are going to take are:

Joining the files
Filtering
Dropping columns
Creating columns
Handling null values
Making some other adjustments

In the end, we’ll have 35,867 kicks to analyze, and the dataset (transposed) will look like this:

Sample of the dataset (transposed) — Created by the author

We are now ready to get some answers!

What’s the accuracy as a function of goal distance? What’s the probability of scoring at the “field goal range”?

Accuracy is the number of kicks converted / total attempts. Plotting it against the distance, we’ll get the following curve:

Looking at the curve, we can see some things:

As expected, as the kick gets longer, the accuracy decreases.
For shorter kicks, the accuracy decreases very smoothly. As they get longer, this drop is more pronounced. After the field goal range, this drop is very fast. At 53 yards, the accuracy is 67%, after just 2 yards, the accuracy falls to 56%.
We can see that the accuracy in the field goal range limit (52 yd) is 60%. After that, it falls very quickly. We can also see that the confidence interval starts to grow, as we have fewer kicks taken on this range.
The curve has some odd behaviors: At some points, we can see that the accuracy is higher than a slightly shorter kick. That doesn’t make a lot of sense, and it’s probably happening because the number of attempts is not high enough. If we had a bigger dataset, that behavior probably wouldn’t hold.

What are the best/worst teams in the NFL kicking the ball? And how about the players?

Defining a metric

Before starting, we need to define something: when comparing two groups, how can we tell which one is the best at kicking?

The first idea is to rank them based on the accuracy, as it indicates the percentual of kicks scored. Makes sense right?

But hold on… having the best accuracy might not mean that they are better at kicking the ball. Let’s consider this scenario with team A and team B:

Team A: 92% accuracy, 30-yard kicks on average.

Team B: 90% accuracy, 45-yard kicks on average.

Even though team B has a smaller accuracy, I think everyone will agree that they are better, since their kicks are from farther away.

Therefore, we need to compute the difficulty of the kicks in the calculation. This is especially necessary as we consider that the “generating factor” of the kicks may not be the same between teams, and the differences in their average distance are not due to luck: Maybe a team is so good that most of their attempts are from extra points. Maybe a team is always behind on the scoreboard and needs to go to fourth down in some situations of an easy kick. Not to mention that if a team knows that their kicker is terrible, they might not rely on him on longer field goals.

This is how we are going to do it then:

1) For every kick, we are going to calculate the “probability of scoring,” based on the accuracy of kicks in that same distance.

2) After that, for each group we are comparing, we’ll separate the kicks they’ve taken, and run 10,000 simulations, using the probability of scoring each kick. After each simulation, we’ll have the accuracy obtained.

3) After the simulations, we’ll have the distribution of the accuracies for 10,000 runs. That distribution is very similar to a gaussian curve, and we can state that the median of the curve, is the expected accuracy of that group.

4) We compare the real accuracy of the group, with the expected value. Subtracting the first minus the other, we’ll have an index that indicates if the group exceeded the expectation or not. That’s what we are going to use to compare the groups!

We’ll make a quick case study! I’ll pick a team to illustrate the logic. How about the Cincinnati Bengals?

After 10,000 simulations, considering the same kicks that the Bengals have taken, we got the following distribution:

Distribution after 10000 simulations — Created by author using seaborn

>> The real accuracy is: 92.11 %
>> The expected accuracy is: 92.74 %

The difference between these values is -0.63%. From this value and from the histogram, we can have a very good sense that the real accuracy was below the expected. In fact, it was only better than about 20% of the simulations.

Best Teams:

Using the idea above, let’s plot the Real Accuracy vs the Expected Accuracy for evey team.

Teams ranked by accuracy — Created by author using seaborn

Ranking by the accuracy, the Ravens, Patriots, and Giants have the best score percentage in these years. The Buccaneers, Washington, and Jaguars have the worst.

But as we mentioned, this ranking can be deceiving: What we actually want to evaluate is the difference between the curves. Let’s create a chart to display the new ranking: The bars indicate the difference between the real accuracy vs expected, and the numbers are how many positions the team have gained/lost from the old ranking to the new.

Teams ranked by the difference between real and expected accuracy — Created by author using seaborn

Some notes about the charts:

The expected and the real accuracy have a positive correlation but is not as strong as one might think: If we make a rank based on expected accuracy, it would be very different from what we got.
A lot of teams were misjudged looking only at the accuracy. Teams like the Lions, Raiders, and Rams were much better than they seemed, and the Chargers, Giants, and Saints had a much better position than they deserved.
The biggest change in the rank was from the Las Vegas Raiders. They were previously ranked the 27th, and with the new metric, they ended up in ninth, climbing 18 positions.
The Ravens’ performance is outstanding. Not only did they have the highest accuracy, butthey also had the biggest lift between the expected performance and the real. For the opposite reasons, the Tampa Bay Buccaneers are not that great…

Best Players:

Now, we are going to evaluate the best players, running the same logic. We’ll filter only players that have at least 120 attempts, which is roughly 2 complete seasons. We are going to do that, because our method can be a little sensitive for players with a small volume of kicks.

Let’s see the charts:

Players ranked by accuracy — Created by author using seaborn

Players ranked by the difference between real and expected accuracy — Created by author using seaborn

We can see that:

The change between ranks is big here. Our top 20 looks very different.
Some players gained more than 40 positions!
You thought Justin Tucker was good? Turns out he is absolutely out of this world.
Kudos to Wil Lutz, too.

Do kickers feel the pressure? How does their performance change when playing away from home? And in games against a divisional rival? How about in crucial moments?

Home vs. Away

As all the sports fans know, it can be very hard to play away from home. The opposite fans can be very intimidating, the temperature/wind might be different than what you are used to, and the travel can be exhausting. Do all these mentioned factors affect player performance?

First, let’s see the real performance and expected performance between groups:

>> The real accuracy from home is: 92.65 %
>> The expected accuracy from home is: 92.41 %

>> The real accuracy away from home is: 91.68 %
>> The expected accuracy away from home is: 91.95 %

The stats indicate that the accuracy from home kicks is higher, and surpassed the expected accuracy. The opposite happens in away kicks.

Nevertheless, let’s run a permutation test, so we can have a statistical validation.

Quick parenthesis here: First, we are going to create a column “dif_index” (yes, I’m not very good with names), that is going to be useful in the calculations:

This value can be positive or negative and indicates how better (or worse) the kick outcome was from the probability of scoring. The idea is:

If the kick is very hard, you won’t be very penalized if you miss it, but will get a great reward if you convert it.
If the kick is easy, you’ll not get a great reward if you convert it, but will be very penalized if you miss it.

Looking at the mean of this value for a certain group, gives a great idea by how much they have surpassed the expectations.

Now, let’s go back to our problem: To evaluate if the performance of home kicks is really better than the ones taken away, we are going to run a permutation test on both groups. Basically, what the test does is to check if the groups are interchangeable: we shuffle the groups N times, compare the mean difference of the ‘dif index’ between them, and see how the real difference fits in the histogram. If the real difference is higher than 95% of the data, we can say that the groups are not interchangeable. You can have a better intuition of what the test does by checking this awesome interactive website.

Now to the test:

home_kicks = df[df['kick_from_home_team']==1]['dif_index']
away_kicks = df[df['kick_from_home_team']==0]['dif_index']p_value = permutation_test(away_kicks, home_kicks,
                           method='approximate',
                           num_rounds=10000,
                           seed=123)
print(p_value)

>> The result of the p value is: 0.0481

The p-value is lower than 0.05, we can say that the performance of kickers playing at home are statistically higher than the ones playing away.

Crucial Moments

First of all, let’s define crucial moments as the kicks in the second half of the game that can change the lead or tie the match. In these situations, we can expect that the pressure on the kicker is much higher. Does that affect their performance a lot?

>> The real accuracy in crucial moments is: 87.06 %
>> The expected accuracy in crucial moments is: 88.14 %
>> The real accuracy in non-crucial moments is: 92.62 %
>> The expected accuracy in non-crucial moments is: 92.53 %

crucial = df[df['crucial_moment']==False]['dif_index']
non_crucial = df[df['crucial_moment']==True]['dif_index']
p_value = permutation_test(crucial, non_crucial,
                           method='approximate',
                           num_rounds=10000,
                           seed=123)
print(p_value)

>> The result of the p value is: 0.0163

The p-value is much lower than 0.05, we can say that kickers feel the pressure at crucial moments! Also, since this p-value is much lower than the previous one, we can also say that this factor is more important than playing home vs away.

Divisional games

In games between teams in the same division, usually, we have a rivalry. Not only that, often the game can be decisive in the fight for the top of the division table. In games like these, do the kickers feel the pressure, and have a worse performance?

>> The real accuracy in divisional games is: 92.28 %
>> The expected accuracy in divisional games is: 92.15 %

>> The real accuracy in non-divisional games is: 92.13 %
>> The expected accuracy in non-divisional games is: 92.21 %

That was not really what we expected. The performance of kickers in games between divisional rivals seems to be a little higher than the other games. The best explanation I could come up with is that division rivals usually are close geographically. That could give the kickers some advantage, considering that the travel time is likely to be a little shorter, and wind and temperature conditions may be similar to what they are used to.

Now let’s test if we have statistical confidence on this pattern.

non_division = df[df['division_game']==0]['dif_index']
division = df[df['division_game']==1]['dif_index']
p_value = permutation_test(non_division, division,
                           method='approximate',
                           num_rounds=10000,
                           seed=123)
print(p_value)

>> The p value is 0.4471.

The p-value is higher than 0.05. We don’t have statistical evidence that the performance of kickers is different against a division rival.

Are the kickers improving over the years?

Using the same logic that we evaluated the teams/players, let’s compare the real accuracy vs the expected accuracy over the last 15 years.

First thing to be noted: We have a big drop in the accuracy in 2014, because of a change in the extra point distance.

Also, from the chart, we can clearly see that in the first half of the analysis, the kickers’ performance were below the expectations more often than not. In the second half, the real accuracy was worse than expected in just 1 season.

Since we have over 2000 attempts in each season, this pattern is pretty solid and eliminates the need for a more robust statistical test. We can confirm that the kickers are getting better over the seasons.

At what age does a kicker hit his prime?

This question is interesting: Since the kickers don’t have the same work rate as the other positions, it’s fair to assume that they can play until an older age. Not only that, their job depends a lot on good mental preparation. More experienced players could have an advantage in that area.

As we don’t have the age of the kicker in the current dataset, we must gather this information in the players’ file. This procedure makes us lose a little more than 800 kicks, as we don’t have the birthday data of all the kickers. That’s not a big deal, we still have plenty of data to do our analysis.

Let’s plot the accuracy vs age of the kicker.

Looking at the chart, we can see a growing trend. There are clearly three groups:

Before 28 years, the accuracy is below the average line
Between 28 and 35, the accuracy fluctuates around the average
After 35, the accuracy is mainly above the average

Based on these conclusions, can we be certain that older kickers are better than younger ones?

No! We probably have a case of survivorship bias.

“The survivorship bias is the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility”

In our case, what might be happening is:

For older kickers, if they don’t have enough game time, they may decide to retire.
If the contract of an older kicker runs out, there’s a good chance that the team will prefer not to renew it, and sign a younger kicker (unless the older kicker is very good)

That means that older players who are still active tend to be very good, as they still have a contract and enough playtime.

Let’s see if a chart with the number of kicks per age reinforces that idea.

Looking at the count of kicks by age, we can see that this number takes a huge drop after the age of ~33 . This strengthens the idea discussed above.

Therefore, we can’t be sure if older kickers are better than younger ones or not. This analysis is not easy to do and it would require longer periods of data. With an appropriate dataset, we could see for example, for every retired player, what age was his prime. Having only 15 years of data, that’s not possible.

Thanks for reading!