The Final Boxscore
Live Predictive Advertising with Google Cloud during the NCAA® Final Four®
Anyone can make a prediction about March Madness®. But to make one based on real-time data that is built and broadcast in less time than it takes for a coach to deliver a halftime pep talk, look to Google Cloud.
When we began our partnership with the NCAA® to help them understand their data, we knew there was a unique opportunity to demonstrate what the cloud can do. In 2018, most organizations are aware of the importance of maintaining healthy data preparation and storage practices. Many are able to use that data to perform actionable analysis. But jumping from analysis to predictive insights, while rich in potential and value, can seem like a difficult hurdle — one far beyond the horizon. We knew that March Madness provided a prominent chance to illustrate otherwise, so we created a multichannel ad campaign the likes of which had never been done before: a set of predictive ads not only leading up to each of the final three games of the NCAA Tournament, but also analyzed, created, and shipped during the games themselves.
Live TV takes a lot of planning, and so do live ads. In addition to the months spent architecting the data pipeline, we had to create a set of possible narratives that we’d be able to use effectively at a moment’s notice (or, as the clock wound down in the first half). In conjunction with our creative partners at Cloneless and Eleven Inc., and along with the NCAA, CBS Sports and Turner Broadcasting, we had to develop a plan to choose the best narrative given the game’s trajectory, pull the right data to generate our prediction, and ship a high quality ad (both statistically and creatively) all before the start of the second half.
Yet all of that planning would have been moot without meaningful predictions. Fortunately, we knew we had the tools to make it worthwhile. Here’s what we built and why we built it that way. Best of all? Here’s how we did.
There were two main considerations around what to predict: one, to make sure the types of predictions were appropriate to the circumstances, and two, to make sure we’d be able to deliver them in an extremely narrow period of time.
Our predictions needed to meet a Goldilocks-esque threshold of probability — not too hot and not too cold. In case you’re thinking, why not aim as close as possible to 100% certainty? Well for starters, it wouldn’t be very interesting. For example, we could have predicted with ~99.5% certainty that there would be a three-point attempt in the second half of the championship game (there have been 174 games since 2009 without a three-point attempt in the second half!), but that just isn’t very surprising. Some questions do require 99.5% accuracy, but this wasn’t one of them. The probabilities used in the ads were designed to be between 70% and 80%, which were determined to be the sweet spot for our use case of college basketball.
With our threshold set, we needed to pick some themes. Since we knew we’d have an extremely short window to turnaround an ad during the NCAA Final Four® and Championship, we built a fixed list of themes for which we’d build predictions. They included:
- Three Point Shots Attempted
- Three Point Shots Made
- Field Goals (Two Points + Three Points) Attempted
- Offensive Rebounds
We also built a workflow for other predictive themes like free throws, turnovers, and bench strength, which we used to support our overall modeling over the course of the entire season.
For each prediction theme, we used regression and classification modeling techniques. Regression lets you estimate a continuous value given a set of features e.g., “There will be 23 three-point attempts in the second half.” Classification lets you identify the category of an object given a set of features e.g., “There is a 58% probability that both teams will combine for less than 19 assists in the second half.” We used as few as 20 and as many as 300 features to train our models.
We trained our models for each theme on each team individually, as well as their combined totals, creating 21 prediction options per game. We then built two additional models to predict at halftime if a game would become close and late OR go down to the wire. (More on “close and late” and “down to the wire” here.) Though we never ran either of these two additional themes because the first-half statistics for the three final games didn’t trigger our model to think the games would be close (and ultimately they weren’t), it brought the number of total prediction options in our list to 23 (a very basketball-y number).
Although we had an abundance of predictive options for each game, we also had to be realistic, given the various restrictions and guidelines, such as:
- Limited time and visual real estate. We only had 30 seconds to tell a story. Predictions needed to be easy to digest and accessible to a wide audience, meaning we avoided overly wonky terms like ‘confidence interval’ or ‘error margin,’ and even ‘assist-to-turnover ratio.’
- Predictions needed to fit the game context. Every game takes on its own unique context, our predictions needed to be applicable and appropriate to the game. We conducted a lot of pregame analysis to prepare ourselves for the themes most likely to emerge in the games.
Equipped with our data analysis workflow, models, and guidelines we had our playbook for each of the final three games. For example, here’s how it went down for Villanova-Michigan.
Study: We started with our pregame tear down, finding highlights for these teams and the best analysis angle that would fit our themes. The majority of this data never made it to air, but it guided our picks and produced insight for other publications like our contextual New York Times ad: “Villanova has five student-athletes who contribute 12 to 22 percent of the team’s total made three-pointers per game. And when they connect, they do so from 2.178 inches farther away from the basket than Michigan.”
Probabilities: Equipped with latest game data, we retrained and ran our models to produce our estimates for two intervals: before the game, and at halftime. We looked at linear regression based models first, then hunted for outliers with our classification models e.g. look for predictions that are in the 75th percentile. For regression-based estimates, we applied an empirically calculated error for each statistic that is used to calculate accuracy. We then created our probabilities driven by the accuracy across our test set of games, which was typically 20% of games over the last four seasons. For the ads that ran in the first half, we had about 24 hours to turn around our predictions, but for the halftime ad, we ran through the exact same process while the first half progressed. And though tracking regression models may not sound like the best way for a basketball fan to watch the first half of a game, it certainly adds a unique dimension of suspense to the game play!
Context: Presented with a menu of probabilities and estimates, we picked out the most appropriate theme based on our pre-game study, guidelines, and basketball expertise. For example, coming into the championship game, our models indicated we could run an ad that estimated 18 three-point shots made by both teams combined. (We were 78.3% sure there would be more than 15.) But coming off of Villanova’s 18 for 40 three-point performance versus Kansas, 18 by both teams combined just didn’t seem that impressive. So we pivoted our theme toward possessions instead (which turned out to be a good move on our part: there were only 13 three-pointers made in the end). A hard decision to make, but much easier with the right data.
Ship It: As the clock wound down in the first half, we whittled away less desirable themes until we had around two strong prediction candidates left. From there, we tested them against our creative options. Each ad theme had different ways of telling a story laid out in a toolkit: we’d start with a theme like “rebounds,” and fill in the blanks with the predicted numbers and supporting creative. Finally, we’d render the ad and send it to broadcast. Since our models were running continuously through the first half, the time it took to finalize the theme, insert the predictions, verify the storyboards, confirm the numbers, produce the video, get approvals, and ship it was less than ten minutes.
That’s how the ads were made, but how did they actually do? Let’s look at two different aspects of grading this project: accuracy and boldness.
The easiest way to look at the accuracy of our predictions is “were they correct?” In the end, the predictions in all six ads were proven true in the games.
All six ads were effectively classification-style ads precting a stat to be at least some floor. But our predictions also included exact estimates. While the likelihood that the result would be that exact number was low, it represented the most likely single value. For example, in the second half of the championship game we predicted with a 76.4% probability that there would be at least 21 three-point attempts combined. Our exact estimate was 24. The final result? 24.
The list below shows the floor values and probabilities from our ads, alongside our exact estimates and final results.
Loyola-Chi v Michigan
Kansas v Villanova
Villanova v Michigan
These predictions were fun to track, especially because they could be interconnected. For example, estimating three-point attempts is affected by possessions, since you can’t shoot a ball you don’t have. From there, three-point attempts can be impacted by time and score, not to mention a team’s offensive strategy (e.g. push the three or work the ball inside).
For each prediction we also built a real-time graph to track the progress. These time series views helped us understand when teams were trending hot and cold. Instead of waiting until the end of the game to measure accuracy, we were able to project how a stat would land over the progression of the game. Note: For certain models we ran Monte Carlo simulations to project how a stat would land at the half or full time. You can see how both teams began to light up from behind the arc, keeping us on pace with our prediction.
You can measure success in many different ways, but it should always relate to the original question you asked of your data. While hitting 100% accuracy on our original questions (i.e., the predictions made in our ads) feels nice, we can actually get more precise on the accuracy of our accuracy, and thereby get a better sense of how good our models actually were. If you look at the percentage delta between our exact estimates and game results, our three-game average was 94.34%. Pretty good!
In addition to accuracy, it’s important to look at risk as one way to perceive boldness. Given the probabilities mentioned across the ads, we actually didn’t have a huge chance of getting all six right — it was more likely that we’d get only four or five correct. We had only a 15.1% chance of getting all six right, and there was even a small chance we’d get all six wrong. But with solid models and a bit of luck, we determined that this would be an acceptable while still aspirational balance of risk to reward.
Meanwhile, we could also look at boldness by looking at what the numbers in our predictions actually mean. For instance, if you wanted, you could look at all NCAA games, pick out the 25th percentile for any stat, and say you’re 75% sure the value in this game would be above it.
But that approach won’t give you a very good estimate for a single game since teams can vary wildly in their speed, efficiency, distribution of shot types, and so on. A prediction based only on general stats and percentiles will never be as bold as predictions based on trends and stats specific to these teams in this game. This Villanova team was very different from Michigan, which was very different from Loyola-Chicago, which was very different from Kansas. A more complex model (like the ones we went with) can help you identify unique features of a particular team, which in turn can help your estimates grow in boldness and confidence.
We’re proud of the work we were able to do with our partners, both in terms of the actual predictions made as well as the types of insights and workflow we could illustrate using March Madness as the canvas. Even beyond proving that it’s possible to create and ship accurate, meaningful predictive TV ads in half of a halftime, we developed a whole set of analytic metrics that no one had been able to attempt before.
True, predictive insights are only as good as the questions you ask. But with good data and good tools, you’re in a much better position to begin to ask those questions, and extract new value from your own resources.
We hope you’ve enjoyed the work we’ve started here, and continue to imagine new ways to play with the public data at your disposal. We have every confidence this is only the beginning of what can be achieved when you know what your data knows.