Bayesian Projection of Rookie QBs

Published in

Top Level Sports

4 min readApr 26, 2021

The 2021 NFL Draft is coming up this week- so I thought I might look at how last year's rookie class did in the 2020 NFL season and project some key statistics for the 2021 season. I looked at 4 players: Joe Burrow, Jalen Hurts, Justin Herbert and Tua Tagovailoa.

I’ve written some previous articles on my dive into bayesian statistics which can be found here: https://medium.com/codex/bayesian-inference-of-nfl-tds-6d4b1251eeda or https://ecavan.medium.com/predicting-nfl-first-downs-453a683a827d.

Basically, bayesian inference works by taking your prior beliefs and updating them using observed data. For example: I might think a coin is rigged, and so my prior might be (0.8, 0.2), meaning a 80% chance of heads, 20% chance of tails. But if I flip the coin 10000 times and get something like this (0.55, 0.45) (55% heads, 45% tails — this is called the likelihood) then my posterior distribution (my updated prior beliefs) are going to adjust for the fact that the coin is probably not rigged. If you want to learn more about bayesian statistics, I’d refer you to: http://varianceexplained.org/r/empirical_bayes_baseball/ and https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers .

In the context of what I’m doing in this article: I’m using the college stats from the aforementioned QBs as my prior, and I’ll update them using stats from the 2020 NFL season. The resulting posterior distribution will be the projections for the 2021 season. Data was taken from: https://www.sports-reference.com/cfb/players/tua-tagovailoa-1.html .

First, we can visualize our prior for Joe Burrow for some of his key statistics. I made random draws from a beta distribution (normal distribution for Yards per attempt) using the sample mean from his college data.

td = np.random.beta(78, 945, size=10000)
yds = np.random.normal(9.36, 3, size =10000)
intr = np.random.beta(11, 945, size=10000)
cmp = np.random.beta(650, 945, size=10000)

A beta distribution is good for simulating percentages, and the goal here is to project say Burrow’s TD% and then by multiplying by the number of pass attempts he might throw next year I’ll get his 2021 season’s TD projection total. The priors look like:

Prior Distributions for Joe Burrow using his College data (Image by Author)

I took the observed data from here: https://www.espn.com/nfl/player//id/3915511/joe-burrow . We can similarly look at plots of the likelihood functions by simulating from a binomial distribution (normal distribution for YPA). Binomial distributions are used for the probability of success under N trials, for example 13 touchdowns over 400 pass attempts could be simulated as Binomial(400, 13/400). The likelihood plots look very similar, so I will refer you to the kaggle notebook that I will link at the end if you want to see them.

Now that we have our likelihood and prior we can get our posterior distribution from PYMC3, a python package for bayesian statistics. These simple prior-likelihood pairs have a closed form solution, but I thought I would use PYMC3 for practice. I wrote the model as a function, and then I input the parameters I needed to get the plot of the posterior distribution:

def run_model(a,b,c,d):

    model = pm.Model()

    with model:
        params = pm.Beta('param_of_interest', a, b)
        observed = pm.Binomial('observed', c, d,observed=True) 
        trace = pm.sample(1000, return_inferencedata=False)
        
        plot = pm.plot_posterior(trace)
        
    return plot

And we’re done, in the plot below you can see the posterior distribution for (from left to right): Yards per attempt, Touchdown per attempt and Interception per attempt.

Posterior distributions for YPA, TDrate & INTrate (Image by Author)

The plots so far have only been for Joe Burrow. For a quick and dirty projection, I multiplied the posterior sample mean by 550 pass attempts (seemed like a conservative estimate for the number of attempts in a given season — Burrow had 404 in an injured 2020, Herbert had 600 in the 2020 season) to get the projections for the 2021 season. The results for all 4 QBs are shown below:

Based on the priors (the college statistics) you can see the model is low on Herbert’s star rookie season and is very high on Tua despite a disappointing rookie season. This is because Tua outperformed every QB here on a rate basis in college. This model can be improved in a lot of ways, but the projections here are certainly not unrealistic. For those interested in the full notebook: https://www.kaggle.com/sportsstatseli/bayesian-projection-of-2nd-year-qbs?fbclid=IwAR0kejuhC284MaKCjKUnMlzpcPFWezypyAF58yNGLO7vhXdaeXDAN_v5NZA .

Thanks for reading! If you want to hear more from me look for me at:

https://elicavan.wixsite.com/site

https://www.linkedin.com/in/elijah-cavan-msc-14b0bab1/

Bayesian Projection of Rookie QBs

Written by Elijah Cavan