Learnings from a new NFL Prediction Environment

Ross Blanchard
Sep 6, 2018 · 3 min read

I’m happy to say that as a result of the last series of NFL prediction articles, I’m now working with Lineups.com to work on their betting prediction algorithms! 🎉 When I started predicting games, I was certainly pretty green, it was my first large-ish programming endeavor, and I learned a ton. I started out just trying to predict games, then wanted to give my friends betting recommendations; upon realizing it was kinda viable, I wanted to share. Vegas is clearly the common enemy here. Unfortunately, that goal is infinitely expandable, and I quickly realized that A) Vegas will always have more resources than the bettor and B) I’m only one person. That’s why I’m happy to work on a platform that would’ve taken me another year to flesh out, with a lot of interesting ideas already in place.

This has produced a new (to me) way of producing predictions. Previously, what I was doing was a very high-level aggregation of some stats for positions that had high impact on game outcome. This is flawed in a few ways, but largely the issue is the all the potentially impactful features that aren’t being considered. The new prediction-generation scheme actually predicts every single play of a given game. This allows for some interesting possibilities, but in my opinion the most important one is that it allows for a centralized model that can be applied to spread, moneyline, and O/U bets. In this way, you’re avoiding overfitting on a particular feature that might be relevant to one bet type and not another. It’s this overfitting and usage of different models for different bet types that produces contradictory results. Another nice benefit of this style of prediction is being able to produce live predictions on a play-by-play basis, which at least I think is pretty exciting.

There are certainly challenges that come along with predicting in this way though. When the score of a game is the product of a bunch of cumulative prediction, there is always the potential of cumulative errors building up over the course of a game’s predictions. This means that the play prediction process, and the way that those predictions are aggregated, needs to be pretty robust. Another consideration is the amount of data that you’re able to get on the fly, and the speed at which you’re able to get that data. When training a model, it’s all well and good to use all the past plays as training data, but in a live scenario, it’s fairly difficult to get the data from the past play quickly and accurately enough to predict the next play before it starts. Finding a good API for this information and speeding up the model loading/prediction process is something I continue to work on (currently considering HDFS storage for models).

Another new challenge that I’m running into is integrating a data science workflow with a frontend, which is not something that I’m natively used to doing with DS pipelines, or with Django. But mostly that’s just a complaint that there’s no library that I’m aware of for facilitating this. I think I’ll probably follow up with another post on the general structure I’ve decided upon when I get to that point.

I hope this season to continue to write about the prediction implementations that I’ve found effective and the decision process that I’ve gone through to get to that point. It’s a continuous process, so I imagine that this will change over time.

I will say though that recent exploration has just continued to confirm some of the conclusions that I reached last season:

Vegas becomes increasingly accurate as each season progresses, but this starts having a strong effect midway through the season. This means that prediction models after that point need to be much more accurate to keep up and stay profitable.

O/U is particularly tricky, because the odds are typically around -110, which means predictions need to be at least 55% accurate to be profitable. Almost every model that I ran predicted overly heavy on the under side, which were generating either mildly positive or negative returns. I was able to overcome this for the most part by adding a constant multiplicative factor to the total prediction, but I’m still looking for a better way.

I still think that the best betting model is parlaying Moneyline bets. Last season, the home team won 57% of games. This is not a profitable model if those bets are individual, but certainly could be if they’re parlayed. However, creating a good model for recommending parlays is very difficult, so that’s still on the horizon.

Ross Blanchard

Written by

Currently doing software dev on a Lab Information Management System in San Diego. Writing an NFL game prediction platform as an aside. Posting on the latter.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade