More Than a Feeling: Predicting Close Games

Elissa Lerner
Analyzing NCAA Basketball with GCP
5 min readMar 17, 2018

If you could predict the likelihood of a game playing out in a particular way, what kinds of features would you look at in the gameplay?

Considering how often college basketball games seem to go close and late, or even down to the wire, we thought we’d see if we could predict the likelihood of the occurrence of a close game.

But what is a close game?

A feeling, to be sure. An anxiety-ridden nailbiter for a die-hard fan may be mere curiosity to a layperson. But we couldn’t predict anything if we didn’t come up some definitions to test, so we put on our basketball hats and came up with some scenarios that felt like they could pass for “close and late” or “down to the wire,” intentionally ignoring, for the moment, how well we might be able to predict whether games would actually play out that way.

How close is close?

The key to a close and late game started with ‘late’ — we wanted to explore scenarios in the final four minutes (after the final TV timeout) and the final five minutes (the last 12.5% of the game), with slight preferences toward the final five minutes, since a TV timeout is somewhat dependent on game stoppages. From there, we started thinking about deficit values: 3, 6, and 9 felt appropriate since they directly correspond to the number of possessions needed to tie or take the lead (1, 2, 3, respectively). We then assigned combinations of time left and points left that relate to a rough estimate of at least two possessions per minute (given a 30 second shot clock). So, if a team trailed for less than seven for at least two minutes, they had at least two opportunities to cut that lead to a one possession game.

Our definitions for down to the wire games followed a similar logic, just on a shortened time frame.

With that in mind, here’s every scenario we trained and tested:

Model Behavior

We wanted our definitions to pass the 65% confidence threshold, which would give us a satisfactory range for both precision and recall to work with (some background reading, if you need it).

Since we were focusing on the lead or deficit at the five-minute mark and below, we had to write a query that would get the score at five minutes, zero minutes, and every basket in-between. The query results allowed us to then determine whether or not each game satisfied a particular definition of close late. We deployed three different types of classification models (logistic regression (LR), multilayer perceptron (MLP), and gradient boosting (GB)) on each definition with python’s sklearn library, training them on 26,000 games and testing on 5,000. Subsequent model analysis (precision, recall and class distribution) was then done in R.

While any of our definitions could pass for a close and late game, Close_1 was the most predictable, giving us the best results for both precision and recall. Across the three models, 67.5–69.3% of games above the 65% confidence threshold were correctly predicted to be close. As for recall, our 65% confidence threshold captured 41.8–42.6% of the games meeting our definition.

Down to the wire worked out a bit differently. Logistic regression performed better than the other models, and of those results, Wire_5 was the strongest, with 60.2% precision, and 14.9% recall.

So What?

Just because a definition is the most predictable one doesn’t mean it’s also the truest one, nor does its predictability tell you anything about the nature of the definition. In a way, some of what our experiment showed is what you might have intuited from the start — barring passing out electrode caps and heartbeat monitors in every NCAA stadium to truly measure excitement, the closeness of a game remains subjective; whether you can predict any given model for it is not.

But if you do like using predictive models, you can use them to discover any number of other fun stats. For instance, using our most predictable definitions, we learned that 45.6% of NCAA games historically could be considered close and late, and 38.3% of games have gone down to the wire.

We learned conference games are more often close and late than non-conference — some 53% to 45%. Moreover, 45% of conference games have gone down to the wire, while only 38% of non-conference games have done so. (So if you had carefully read our statistic about the PAC-12 winning the most close and late games, you might have realized that it didn’t include the most common types of close games, but upon further reflection, you might have struggled to find a way to use intraconference performance to predict how teams from different conferences would fare against each other.)

We also looked at how these games track over time. Unsurprisingly, close games increase in prevalence over the course of a season, as do wire games (those sharp dips are December 26, in case you’re curious.)

Conveniently, we also learned that Ohio State historically plays the most close and late games on the road (64%) while Gonzaga historically plays the fewest close and late games at home (27%). The two teams tip off imminently in Idaho, which, while technically neutral, is effectively a home game for the Bulldogs. On top of that, both teams this season have equal “clutch” scores of 70%, which is the percentage of times they’ve won in a close game as defined above.

Well, so much for making a clean prediction based on close and late games. That said, we‘ll be keeping a close eye on these two right until the end!

For more insights around this matchup, check out the rest of our pregame analysis over here.

--

--