Hey everyone, we beat ESPN — Part 1. Background and Methodology

Comparison of fantasy football weekly model performance of Fantasy Outliers vs. ESPN during Weeks 6–16 of the 2017 NFL season for Standard and PPR scoring formats

8 min readJul 31, 2018

Last year, all of us spent time away from our families to create weekly fantasy football predictions, podcasts, and articles for you, because we all believe in the idea that machine learning combined with human expertise can produce better results than either by itself.

In 2015, Fantasy Outliers started as a bunch of braggy graphs that I made when I won a league. Since then, we’ve made historical interactive graphs where users can explore our value-based analysis of what actually happened in competitive leagues, yearly predictive models to help with the draft, and weekly predictive models that help determine whom to start/sit and which under-the-radar free agents are worth their salt.

Last year, we beta-tested Fantasy Outliers’ predictive models, affectionately referred to from here on out as MathBox. Anecdotal results were as follows:

Though a small sample size, a disproportionate number of us won our leagues last year. Notably, in the league that birthed Fantasy Outliers, someone who had historically finished in the bottom three of our league finally won — suggesting that MathBox ‘leveled the playing field’, so to speak.
We found that by predicting performance in the upcoming four games and through the rest of the season, MathBox helped us spot Free Agent pickups a week or two before everyone started talking about them.
Finally, pretty much every single time last year when setting my starting lineups, I made good decisions when using a combination of fantasy football knowledge and MathBox’s predictions and bad decisions when using solely ESPN’s predictions.

But this is anecdotal and based on a small sample size, so we wanted to put it to the test. Was MathBox actually better than ESPN or not? If so, what was it good or bad at? What are the best and worst ways to use MathBox’s predictions to gain a competitive advantage when making decisions for your team?

For the remainder of the article, we will cover the Executive Summary and Methodology of how our weekly predictions were created and compared to ESPN’s weekly predictions. To jump directly to the detailed results, go to Part 2.

Executive summary

Fantasy Outliers’ weekly models (AKA MathBox) outperformed ESPN’s weekly point projections in Weeks 6–16 of the 2017 season in terms of raw points projections for Quarterbacks. For RB, WR, and TE, the results were typically about even or slightly trending towards MathBox’s favor for Top 20 players — and ESPN typically won comparisons amongst players ranked 20th or higher.

When MathBox’s projections were used as directional indicators relative to ESPN’s predictions — similar to how they were used in practice last year — results were very encouraging with positive results in all position groups.

Individual player and team results raise the hypothesis that MathBox may be at its best when predicting low-buzz, under-the-radar players (but at this point, we’re not sure).

Methodology —Assessing Fantasy Outliers’ weekly model performance

Since the season ended, we spent some time tweaking our weekly models, which are now on at least our seventh substantial iteration. In order to test the results, we compared to ESPN’s weekly models. We did this for quarterback, running back, wide receiver, and tight end (we don’t yet have weekly models for defense and kicker).

We trained our models on data up through Week 5 of 2017 and tested the results on Weeks 6–16 of the 2017 season.

To the best of our knowledge, ESPN’s weekly projections are derived somehow from a player’s rating going into the week and historically, how players with that same rating have performed. Thus, the basis of their calculation involves expert ratings (ESPN, please correct us if we are wrong). MathBox, on the other hand, is calculated completely from the ground up, based on statistics and machine learning. As stated above, our vision at Fantasy Outliers is that machine learning, when tweaked strategically by human expertise can produce better results than either by itself.

Starting dataset — Poor man’s NFL stat sheet

We used an affordable (“poor man’s”) dataset found at Armchair Analysis. It consists primarily of traditional individual and team, offense and defense NFL statistics going back a couple of decades. More recent NFL datasets may have more advanced features, but we opted for more common (and affordable) stats that have been around awhile.

To supplement this dataset, we ‘scraped’ publicly available data off the web related to coaching history, contract status, and other odds-and-ends in the public domain.

Features used in models

We sliced and diced this dataset to create features related to individual players and the team as a whole for both offense and defense, respectively, across different periods of time. Of course, we only used data that was available prior to the start of the game — so while, we could use things like starting QB historical ratings (since more often than not, everyone knows that before the game starts) we couldn’t use any in-game stats. When all was said and done, combined across all position groups, our dataset consisted of nearly 7K total features.

Model training — Machine learning, meet fantasy football

This dataset of six thousand plus columns and tens of thousands of rows went through an extensive model training process, involving pre-processing techniques, feature selection, hyper parameter optimization, and more. The gist is, as follows:

Weekly models were made for PPR, Standard, and in some cases, Half-PPR scoring formats. For each scoring format, models were made for QB, RB, WR, and TE, respectively. In some cases, separate models were made for different tiers of player (stars vs. starters vs. benchwarmers, for example).
For each of these cases, models were made that predict, for a given player in a given week: Points Scored (in a given scoring format), Opportunities (targets, rush attempts, etc.), and Points per Opportunity (in a given scoring format).
For each resulting model, a combination of different years, clustering techniques, feature selection techniques, and modeling hyper-parameters were tried until the best, most predictive combination with the lowest error rate was found.
The error rate was calculated by fitting the model on the dataset up through a given year, testing results on the subsequent year, and aggregating the results — with performance in more recent years weighted more heavily. So, for example, a model with a given set of parameters would train on years 2002–2012 and test on 2013, train on 2002–2013 and test on 2014, and so on (with the aggregated error for 2014 weighted slightly more heavily than 2013).

Comparison to ESPN’s weekly point projections

In order to compare our model results with ESPN’s, we are faced with the challenge that ESPN’s predictions are hand-adjusted to account for anomalous situational changes (injuries, QB changes, what have you) while Fantasy Outliers’ predictions have no such adjustment (since they were made after the season was over — but only trained on data known beforehand).

Therefore, in order to mitigate some of the advantage that ESPN has going into this comparison, we focused our analysis on wins and losses, rather than point estimates differences.

While our models may know the number of consecutive games a player has played with a team or the starting QB’s historical rating, they DO NOT know if a player went down in the fourth quarter last week and was listed as questionable on the injury report, nor do they know if an opponent’s star defensive player is injured, or if the star Wide Receiver was injured but is coming back this week— or any other host of situational anomalies that inevitably arise throughout the course of an NFL season.

Also, when we used these predictions live last year, while we might not have believed the actual point projection output by the models, we definitely paid attention if MathBox was high or low on a player compared to expert consensus.

With those caveats in mind, here are the metrics we used for comparison:

Points: Compare’s both MathBox’s raw points projections to ESPN’s point projections
Points (opportunities * points per opportunity): Calculates points by multiplying MathBox’s opportunity projection by MathBox’s points-per-opportunity projection, and compares to ESPN’s point projections
Increment ESPN directionally: If MathBox was higher than ESPN, the resulting ‘prediction’ was just a tad bit higher than ESPN’s. Conversely, if MathBox was lower than ESPN, the resulting prediction was just a tad bit lower than ESPN’s. For example, if MathBox’s original prediction was 18 and ESPN’s prediction was 19, the resulting prediction would be 18.99. If FO’s original prediction was 20 and ESPN’s prediction was 19, the resulting prediction would be 19.01. Reason being: this is more like how we actually used the predictions in practice — i.e. as indicators of direction relative to common knowledge.

For each comparison, we created beta distributions — probability distributions that incorporate wins and losses (ignoring ties) — and calculated the probability that ESPN’s predictions were better than MathBox’s. This is a common technique used in A/B testing for different website landing pages, for example. We calculated probabilities using two methods and chose to report on the one that tended to be the more more conservative (closer to 50%) of the two.

MathBox vs ESPN weekly points projections comparison in the form of a beta distribution for Top 20 QBs who have played in at least 2 consecutive games with the team

Results — Comparing Fantasy Outliers’ vs. ESPN’s weekly predictions

For complete results, visit Part 2 of this two-part series…

Please, join us in our quest to bring data science to the fantasy football world. Follow us on Medium, follow us on Twitter, and together, let’s dominate our fantasy football leagues in 2018!