Building a Basic, In-Game Win Probability Model for the NFL

In-Game Win Probability for Super Bowl LI (
#Install nflscrapR
devtools::install_github(repo = "maksimhorowitz/nflscrapR")
#Load libraries
pbp1 = season_play_by_play(2016)
saveRDS(pbp1, "pbp_data_2016.rds")
pbp = bind_rows(pbp1,pbp2,pbp3,pbp4,pbp5,pbp6,pbp7,pbp8)
saveRDS(pbp, "pbp_data.rds")
games2016 = season_games(Season = 2016)
games = bind_rows(games2016, games2015, games2014, games2013, games2012, games2011, games2010, games2009)saveRDS(games, "games_data.rds")
pbp_final = full_join(games, pbp_raw, by = "GameID")
saveRDS(pbp_final, "pbp_final.rds")
pbp_final = pbp_final %>% mutate(winner = ifelse(homescore > awayscore, home, away))pbp_final = pbp_final %>% mutate(poswins = ifelse(winner == posteam, "Yes","No"))pbp_final$qtr = as.factor(pbp_final$qtr) 
pbp_final$down = as.factor(pbp_final$down)
pbp_final$poswins = as.factor(pbp_final$poswins)
pbp_reduced = pbp_final %>% filter(PlayType != "No Play" & qtr != 5 & down != "NA" & poswins != "NA") %>% select(GameID, Date, posteam, HomeTeam, AwayTeam, winner, qtr, down, ydstogo, TimeSecs, yrdline100, ScoreDiff, poswins)
split = sample.split(pbp_reduced$poswins, SplitRatio = 0.8)
train = pbp_reduced %>% filter(split == TRUE)
test = pbp_reduced %>% filter(split == FALSE)
The logit function
model1 = glm(poswins ~ qtr + down + ydstogo + TimeSecs + yrdline100 + ScoreDiff, train, family = "binomial")summary(model1)
Logistic regression model
pred1 = predict(model1, train, type = "response")train = cbind(train,pred1)train = mutate(train, pred1h = ifelse(posteam == HomeTeam, pred1, 1-pred1))
ggplot(filter(train, GameID == "2016090800"),aes(x=TimeSecs,y=pred1h)) + geom_line(size=2, colour="orange") + scale_x_reverse() + ylim(c(0,1)) + theme_minimal() + xlab("Time Remaining (seconds)") + ylab("Home Win Probability")
Denver Broncos (versus Carolina Panthers) In-Game Win Probability (September 8th, 2016)




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

What is Data Driven Work Management? (Data Driven Work Management, Part 1)

Five Hot Ways to Use Heatmap Visualizations

Five Hot Ways to Use Heatmap Visualizations

Some notes on my collaboration with The Washington Post on the gender pay gap

Interpretable Machine Learning in 10 Minutes with RuleFit and Scikit Learn

Still speaking about my data science training with @10Alytics and Efemena 10Alytics.

Import Files into the Repods Cloud Data Platform

1. On the role and the whatabouts of Ontology

Learning Day 8: Pytorch stats terms

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Stephen Hill

Stephen Hill

More from Medium

Predicting Energy Consumption of Electric Vehicles

Image Source:

Machine Learning In Marketing Is Changing Businesses. Learn How!

Marketing and Machine Learning

Predicting NBA Post All-Star Break Wins

5 Reasons Private Equity firms are embracing AI to source deals