How I Achieved a 56.4% Accuracy and 7.6% ROI Using Machine Learning to Bet on NFL Totals (Over/Under)

Andrew Josselyn
4 min readFeb 16, 2024

--

How I beat the bookmakers using Machine Learning.

Photo by David Vives on Unsplash

Background

I found success previously using weather data and machine learning (ML) to accurately predict the 2023 Kentucky Derby’s winning time and MLB’s 2023 Totals (Over/Under). After my fantasy football teams were shown to be pathetic (week 3) I decided to see if I can use weather data to accurately predict the Over/Under of NFL Games.

Data

Historical data from 1966 all the way up to the end of the 2022 season (2/12/2023) was pulled from here: https://www.kaggle.com/datasets/tobycrabtree/nfl-scores-and-betting-data. I filtered data from 2000 onwards and used this data to try and predict over (1) or under (0) based on the following predictors:

  • Favorite team’s spread
  • Over/under line
  • Temperature (F), if in a stadium I defaulted to 70
  • Wind speed (mph), 0 if in a stadium
  • Humidity, defaulted to 40 if in a stadium.

Weeks 3 to 14

Training and Testing

Data was split into 67% training and 33% testing. A gradient boosting machine (GBM) was used based on the following:

# Create gbm model #
gbm_mod <- gbm(
formula = ou ~ .,
distribution = "bernoulli",
data = train_data,
shrinkage = 0.001,
interaction.depth = 2,
n.minobsinnode = 10, #Default
bag.fraction = 0.5, #Default
n.trees = 500, #Determined using 5 fold cross validation
n.cores = NULL, # will use all cores by default
verbose = FALSE
)

We will use the prediction’s median probability to differentiate between an Over and Under prediction. This is because historically the Over/Under results in 50% respectively which is expected as this guarantees a profit for the bookmakers via the vig. The median prediction probability on the test set was 0.510 which means any value below 0.510 will be classified as an Under prediction and any prediction over 0.510 will be classified as an Over.

We’ll break the accuracy of the predictions on the test set down by the standard deviation of the prediction’s value:

If we follow our model’s predictions with probabilities of 0.482 to 0.495 and 0.50 to 0.51 (highlighted in green above) it should lead to an accuracy of 391/721 = 54.2%

Our test data indicates that if we follow our model’s predictions with probabilities of 0.482 to 0.495 and 0.50 to 0.51 it should lead to an accuracy of 391/721 = 54.2%. Interesting to note that the model appears to have found an advantage in betting under based on specific weather conditions that maybe the bookmakers didn’t pick up on. To be profitable sports betting you have to have a win percentage (accuracy) of 52.4%, which means utilizing the model above would make us profitable. This is impressive because only ~3% of sports bettors are actually profitable.

So how did we do?

From weeks 3 to 14 we performed slightly better than expected at 54.6% accuracy and 4.2% ROI.

Weeks 3 to 14 went great and showed that we could use weather data and machine learning (ML) to accurately predict NFL Totals. However, one thing that really bothered me was the three losing weeks in a row starting week 12. You’ll notice the last column called “Method” shows “No Seasonality”. I found it interesting that starting week 12, which is the Thanksgiving weekend, my model started to perform poorly. My theory is that based on time of the year (month) the Over/Under prediction would change. From Week 15 to the end of the season I included the month in my prediction.

Week 15 Onward

So how did my model change when I included the month:

Following predictions highlighted in the green should lead to an accuracy of 196/328 = 59.8%

By including the month our accuracy increases on the Test data from 54.2% without the month to 59.8%. So how did we do in the last few weeks of the 2023 NFL season by including the month?:

From week 15 onward we again performed better than expected at 62.2% accuracy and a whopping 18.5% ROI.

Including the month in our predictions significantly improved our model’s accuracy from 54.6% and 4.2% ROI to 62.2% and 18.5% ROI.

Overall Results

Overall I achieved a 56.4% accuracy and a 7.6% ROI.

Going forward I will use the model that includes the month to make my predictions and hopefully will achieve a long term accuracy of close to 60%.

--

--

Andrew Josselyn

Data Scientist currently working in the transportation and logistics industry. Interested in applying analytics to better my personal everyday life.