Rushing Yards Above Expectation

Elijah Cavan
Top Level Sports
Published in
4 min readOct 5, 2021

Authors: Elijah Cavan CNG Analytics

Derick Henry (Picture from nflmocks.com)

Hi all; today I wanted to describe a quick project I did on creating a metric to describe how well a particular rushing play turned out. This work was done with my colleagues at CNG Analytics- so please check out some of our work at CNG Analytics for Medium article, or @cnganalytics on instagram (I’ll also drop a Linkedin link at the end of the article).

As usual, I’ll try to quickly (and gently) describe the statistical methods I used to come up with this metric. To solve this problem you would usually have to fit some sort of regression line. The regression line gives you an estimate for the number of yards expected on the play, which you then compare to the true data to see how many yards above or below expectation that particular play was able to achieve.

Well, Linear regression can be bad at times. So instead, I used a Relaxed LASSO model. To give you a quick overview, a LASSO model is just like linear regression except there is an adjustment to the ‘penalty term’. Linear regression has another name — “Ordinary Least Squares”. Basically, you’re just trying to minimize the distance from the line you drop to any real point, that distance is your error (or residual to use a fancy term). This picture say it all, it’s really not that complicated; anyways there are plenty of articles about this if you want further background.

Least Squares regression (picture from Towards Data Science)

So, in conclusion LASSO is just a better way of doing OLS or Linear Regression, where we are able to shrink the parameter estimates (how big your slope would be in linear regression) which helps improve the performance of our model (has to do with bias-variance tradeoff- I’ll save you from the explanation). Relaxed LASSO is even better than LASSO (and basically does the same thing- the math is too much for a difference which can be realized with 1 line of code).

Okay, the hard part is over, let’s see the results of my model. I trained the model using play by play data from the 2019 regular season and used it to make predictions on the 2020 dataset. Here are some of the predictions:

Yards Above Average for the 2020 Season (Image by Author)

Here the “YAA” column are the calculated “yards above average” metric which is found by taking the difference between the actual total and the predicted value. I used more variables in the model but these were the key ones; I looked at the Formation type, the rush direction (had to label encode these for the model), the down and distance to go, ect. It’s cool to see how in the 5th row, the 9 yard gain was 6 yards above what the model would expect given the situation. The YAA_upper and YAA_lower columns represent the upper and lower confidence intervals- this is to show the variability in the predictions.

Let’s see a couple more fun plots and see what we can learn from them:

Mean Yards Above Average by Formation (image by Author)

This plot tells us that the best formation to run under (based on Yards above average) would be shotgun formation. There’s more than 10 times as many plays run out of shotgun and under center as no huddle- so it’s hard to conclude that running a no huddle offence (at least in terms of rushing not passing) is not effective. What is clear is that running under center is much less effective than running with the shotgun formation.

Mean Yards Above Average by Rush Direction (image by Author)

We can also look at how the direction you run in affects your expected rush yardage. Here we see it is a good idea to run behind your Left Tackle (usually your best offensive lineman) and a bad idea to rush behind your center.

Finally, we can look at the best teams in the NFL in 2020 at gaining Yards above expected.

Mean Yards Above Average by Team (image by Author)

As expected teams like Titans (with Beast mode 2.0 Derrick Henry) do quite well in this metric.

That’s all for now, I may come back and add/fix these plots (I did all this in R and I’m really more of a python person); but I hope you found this interesting. As always, I’d appreciate it if you could support me by telling your friends to visit my medium page and website which are linked below. And check out CNG analytics while you’re there!

--

--