Measuring Team Clutchness (Part 2): …Be Careful Projecting Forward

Published in

Analyzing NCAA Basketball with GCP

11 min readMar 29, 2019

In Part 1 of our two-part post on measuring how clutch NCAA men’s basketball teams are, we developed a “Clutchness” metric using our NCAA play-by-play data in BigQuery and Colab. We saw that LSU and Duke, who both take the court in the Sweet 16 on Friday, were ranked in the top three of our Clutchness ratings based on games through March 26.

To refresh your memory (see Part 1 for much more detail), we look at the situation with five minutes left in regulation — score difference, site, and opponent — in each game, use a model to determine the team’s win probability at that point, and then measure how the team progressed toward the end result (win or loss) from there. When we add up the changes in win probability across all games for the season, we get a team’s total Clutchness: win probability added in the last five minutes of games, adjusted for schedule.

While we felt pretty good about Clutchness as a retrodictive measure — looking backward to see how clutch a team has been — we left off with an important question: what does it mean going forward? Broadly, is high or low performance in the clutch a consistent trait that we can ascribe to a team, perhaps even use to predict future results? Or is it something more fluky, a product of randomness, as statisticians have often found to be the case when analyzing clutch play across other sports for the last few decades?

We dove back into the data to answer these questions about our specific Clutchness metric, and to help us understand more about how we should interpret and use it in our analysis.

The (In)consistency of Clutchness

One of the first things we looked at is the consistency of this metric within a season for the same team. If Clutchness is a real persistent trait, then it stands to reason that a team that is clutch in one part of the season should generally remain so in other parts.

We can use the team game-level Clutchness value — the win probability added by the team in the final five minutes of each game (described in Part 1) — to calculate first and second half of the season (split by number of games played) Clutchness ratings for each team.

Those values are plotted for all 353 Division I teams on the interactive scatterplot below, where Clutchness in the first half of the season is on the x-axis and Clutchness in the second half is on the y-axis.

The diagonal line shows where points would lie if a team had the same Clutchness in both halves of the season. We can see that many points lie pretty far off the line, including a good number in the top left and bottom right — cases where a team had below average Clutchness in one half of the season and above average Clutchness in the other half.

Two notable examples worth scrolling over:

LSU (purple point, highest on the plot), who we talked about in detail in Part 1, went from above average Clutchness through mid-January (+0.98, 79th) to amazing close-and-late domination the last couple months (+5.49 Clutchness, 1st).
Marquette (furthest right blue dot, just below the x-axis) went the opposite way, winning most of their close games against good competition early on, then losing almost every close game they played from February on.

One simple way to calculate the consistency of Clutchness between the two halves of the season across all teams is to use a correlation coefficient. We calculated both the widely used Pearson correlation (using the actual Clutchness numbers) as well as the Spearman rank correlation (using the ranks, to check more generally for a monotonic relationship) and show them on the top left of the plot above. Both numbers are near 0.3, indicating some positive correlation in Clutchness across halves of the season (both correlation coefficients can range from -1 to 1). This is good, since we don’t want to be measuring something totally noisy, but they do feel pretty low.

To understand more, we generated the same plot for a team’s win percentage across the two halves of the season.

The correlations are higher, near 0.5. Keep in mind that teams often play very different schedules in the early part of the season (mostly non-conference games) compared to the second half (nearly all conference games until the postseason), so some of the fluctuation in win percentage is potentially more schedule-related than team performance itself.

As another point of reference, we looked at the same plot for the team’s average point margin at the five-minute mark — our cutoff for when we begin measuring Clutchness — split across the two different parts of the season.

Still a good amount of spread away from the equal average point margin line, but we see correlations above 0.5, indicating more consistency here. If you look at a team’s average win probability at the five-minute mark split across two halves of a season, you see a result similar to above — correlations above 0.5, more consistency than the Clutchness plot for sure.

The lesson here? (Other than it looks like college basketball team performance fluctuates even more than we previously thought?) Clutchness, at least the way we looked at it, is not very consistent from one part of the season to the next. The last plot shows you’re probably better off measuring a team’s average point margin at the five-minute mark to get a sense of its future performance — after all, 35 minutes is much longer than five minutes (we know, we know. Big data science observation right there!).

Are Final Four Teams More Clutch?

Within-season inconsistency seems to indicate Clutchness may not be the greatest predictor of future success, but it’s possible that there might be something to it in helping determining how far the very best teams go in the NCAA Tournament. Perhaps we need almost a full season of data to know who is really clutch (or not-so-clutch), and then it could tell us something about which teams are best-suited for the high-pressure moments of single-elimination March Madness.

To test this hypothesis, we wanted to look at Clutchness for NCAA Tournament teams of seasons past, and specifically look at Final Four teams to see if they were among the most clutch. We used our same Clutchness and Score Control calculations, but had to apply them to previous seasons. We have almost complete play-by-play going back five years (the “almost” is why we haven’t publicly shared “official” Score Control or Clutchness for past seasons, but it’s sufficient for this purpose). To run past seasons, we modify our BigQuery and Colab framework to run for specified date spans by parametrizing our SQL query by season, start date, and end date, as shown for part of the Clutchness input data queries below.

Part of Input Data Query for Clutchness, with Parameters for Season and Dates Highlighted

Using this framework, we can loop through the past few Selection Sundays — and actually other days within those seasons, as well, but we only use the Selection Sunday ratings for this purpose — and get “pre-tournament” Score Control & Clutchness for all teams. We put these ratings back into BigQuery in a table called “tm_season_bracketiq_metrics_by_date” so then we can join them with tournament results. We have a view called “ncaa_tourney_tm_season_results” based on some NCAA Tournament-specific data from our teammates at Kaggle, so then it’s just a matter of matching up team-seasons, as shown below.

(We’ve mentioned this before, but we can’t stress this enough: we really love views to “modularize” queries we use frequently into distinct virtual tables that be joined in a number of different places.)

Query to Join Selection Sunday Score Control and Clutchness with Team Tournament Results

This produces one row for each of the 340 NCAA Tournament teams in the past five years. Below are the (cleaned up) results for the 16 teams that made the Final Four the past four seasons.

Final Four Teams from Last 4 Seasons, with Selection Sunday Score Control and Clutchness Ranks

From here, it looks like ranking highly in Clutchness seems good (no Final Four team ranked really low), but ranking highly in Score Control, which looks at score margin throughout every game, seems much more predictive of making the Final Four. Last year’s national champion, Villanova, ranked 1st in Score Control and but only 88th in Clutchness as of Selection Sunday. They “solved” that problem by simply not needing to be clutch: they took control of Tournament games from the beginning, and won every game by at least 12 points on the way to the title. (Of note: this year’s Gonzaga team is quite similar, ranking 1st in Score Control and 70th in Clutchness when the brackets were revealed.)

If we compare these 16 teams to the 256 other NCAA Tournament teams that didn’t make the Final Four the last four seasons, we see that Final Four teams ranked about 27 spots better in Clutchness (49th vs 76th) and about 50 spots better in Score Control (16th vs 67th). On the other hand, if we only isolate to top-4 seeds, the 12 that made the Final Four actually had a very similar average Clutchness ranking (37th) to the 52 that didn’t (34th), while still ranking better in Score Control (average of 6th compared to 14th). Score Control clearly seems important, but it’s not clear that Clutchness helps separate the top teams that made it to the third weekend of the tournament from those that fell short.

There are all good data-driven facts, but can be subject to manipulation of parameters and small samples. We’ve shown that Score Control is probably more important, but we haven’t rigorously shown that Clutchness doesn’t matter going forward. To go further, we look more directly at whether Clutchness helps predict NCAA Tournament game results.

Does Clutchness Help Predict Tournament Games?

In total isolation, the answer is yes: the team with a better Clutchness rating on Selection Sunday has won more NCAA Tournament matchups — 57% the past five years, through this year’s Round of 32. But the real question we are asking is if Clutchness provides any added value over things we already know are predictive of tournament outcomes, like seed and maybe a more thorough representation of team performance, like Score Control.

To test this, we decided to create a relatively simple predictive model based on past NCAA Tournament games, using Selection Sunday seeds, Score Control, and Clutchness for each team (which we had pulled in prior). This wasn’t designed to be the end-all, be-all of Tournament prediction models — there are many more of these than we can enumerate, including an annual Kaggle competition among these models with thousands of dollars on the line. We simply wanted to see if Clutchness was a helpful predictor while controlling for a couple known good ones.

Since we had relatively small data already in Colab, we could use some of the more traditional statistical modeling packages in Python, like statsmodels. (For much bigger data of this form that already lives in BigQuery, BigQuery ML is a good option.) We modeled winning, a binary outcome, on various combinations of seed, Score Control, and Clutchness (each from both the team and opponent perspective) using logistic regression. Here are the coefficient estimates from fitting the model using all three factors:

Coefficient Table from Logistic Regression of Wins in NCAA Tournament Games on Team & Opponent Variables

The tm_ and opp_ variables having exactly opposite coefficients is by design. We used one record from each team’s perspective so that our model is not biased toward who we put in as “tm” and who as “opp” in fitting it (not so much a problem in neutral-site NCAA Tournament games, but can be if you always pick the home or away team’s perspective).

More important to our question, we see that the clutchness variables have tiny, statistically insignificant coefficients! Not only that, but they actually have the “wrong” sign: a team with lower Clutchness would technically have a better chance to win according to this model — an obvious flag that it’s not helping. We tried a couple combinations of other predictors with Clutchness in the model (only Score Control, only seed, or both as above), and Clutchness is similarly insignificant in each case (sometimes having the wrong sign, too).

We tried other variations of this, too: looking to see if Clutchness was more predictive of final results only in close games, or if Clutchness predicted team win probability added in the final five minutes of a game (i.e., did pregame Clutchness predict in-game Clutchness?). Over our sample of NCAA Tournament games from the past five seasons, there just didn’t appear to be any predictive value to Clutchness.

So does this mean there is no such thing as a persistent clutch ability for NCAA men’s basketball teams? Maybe. (This is a skeptical data scientist’s answer to almost every question.) Perhaps with more data or a more intricate game prediction model (using game simulation, neural networks, or something else), we could show that Clutchness has some predictive value under the right circumstances.

It also could be that our particular Clutchness metric, which does a good job of summarizing who was clutch in the past, isn’t well-suited for prediction. Perhaps using a longer period than the last five minutes or looking at efficiency as opposed to just winning in “crunch time” would provide more predictive value. Of course, that would likely come at the cost of explaining the past as well as our current metric does. For example, a team that is pretty efficient in the last 10 minutes of close games overall, but is that way by turning some into blowouts while losing a few by a couple points, would be hard to define as “clutch” from a fan’s perspective. Main lesson: a metric’s ability to explain the past and its ability to predict the future are not the same thing, and can sometimes actually run against each other.

Hopefully we’ve demonstrated that Clutchness is an interesting backward-looking metric that shows how much each team improved its chances of winning in the last five minutes of games. It seems to have limited predictive value, so we wouldn’t go ahead thinking top Clutchness teams like LSU and Duke will be bulletproof in close games the rest of the tournament. Perhaps it’s best to look at Clutchness like we are encouraged to see past stock or mutual fund returns: “past performance is not indicative of future results.”

Enjoy the weekend’s games!

Measuring Team Clutchness (Part 2): …Be Careful Projecting Forward

The (In)consistency of Clutchness

Are Final Four Teams More Clutch?

Does Clutchness Help Predict Tournament Games?

Written by Alok Pattani