Measuring Team Clutchness (Part 1): It’s Fun to Look Back…

Alok Pattani
Analyzing NCAA Basketball with GCP
12 min readMar 28, 2019

Clutchness. From seemingly the beginning of athletic competition, it’s been one of the hottest topics in sports. Who comes through in crunch time? Who comes up short when the game is on the line? Do “clutch” reputations match results?

For example, if you watched the scintillating Duke-UCF finish in the Round of 32 last weekend, you’d be tempted to say you saw clutchness on display. Zion Williamson was “unreal clutch” on his game-saving layup with 14 seconds left, according to some.

Sure, except that he missed the potential tying free throw…But then that set up RJ Barrett for his own super-clutch putback that gave Duke the lead!

But then what about when UCF was almost even-more-clutch-but-not-quite, with not one but two potential game-winners rolling off the rim to extinguish any last hope for an upset?

As a fan, clutchness is clearly something real — you know it when you see it. But trying to quantify a team’s clutchness, as we have been doing as part of the Google Cloud-NCAA partnership, is a different story. Statisticians who like sports have been analyzing this concept for decades, trying to determine if clutchness actually exists, and if so, if it’s an actual repeatable trait for a player or a team. We wanted to come up with a metric that measured a team’s “Clutchness” (capital C) in a fair and descriptive way, while also understanding the limits of that metric in how it could be interpreted and used.

Missed Attempts

It’s reasonable to say that being clutch in basketball comes down to end-of-game performance, and, at least on the team level, actually winning the game. We tried to get at this last year, assessing each team’s record in “close” games, with “close” being defined as a small score margin for certain amounts of time in the final five minutes of a game. One limitation of looking at record in games with a “score within X points with Y or fewer minutes left”-type criteria is bias: the team that’s leading by X points entering those last few minutes is more likely to win than the trailing team. And this would have more to do with what the teams did to get the lead or fall behind entering those last few minutes than what they did in those final minutes, making the record less reflective of end-of-game “clutchness.”

Another approach we tried is to look at a team’s record in games where there was a tie or lead change in the final five minutes. This is better for identifying more truly “50/50” games — the thinking being, if the score margin gets to or crosses 0 in the last few minutes, it surely feels like a toss-up from that point forward. But there are downsides to this definition, including that for one, it’s pretty restrictive, leading to smaller sample sizes. It also leaves out games that certainly feel “close,” for example, a game where the winning team holds on at the buzzer after leading the entire final five minutes — like Auburn did against New Mexico State last Thursday. Games like those seem worthy of inclusion in our “Clutchness” evaluation.

Current Play

So after some more thinking, we decided to leverage our NCAA play-by-play data in BigQuery even further, and rely on some statistical analysis to get a fairer and more complete evaluation of each team’s end-of-game performance. Here’s what we did in each game:

  • Looked at each team’s score margin with five minutes left in regulation of each game
  • Got the average team’s chance of winning (note: and not the specific team’s chance of winning! We’ll explain.) based on a model (across all games) considering score differential with five minutes left, site, and opponent strength
  • Using the team’s final result (win or loss), calculated its “win probability added” in the final five minutes (and overtime, if necessary) relative to an average team in that situation

With that win probability added for each team in each game, we could aggregate over the season to see who added the most wins in the last five minutes, accounting for their position at the five-minute mark of each game and adjusting for schedule strength.

On the whole, our Clutchness metric rewards things like:

  • Winning games that are 50/50 with five minutes left (with a positive Clutchness value)
  • Holding on to win when holding a small lead at that point (a smaller positive)
  • Coming back to win when facing a bigger deficit at that point (a bigger positive)
  • Winning these close games against tough opponents and/or on the road (more Clutchness for higher difficulty)

Doing all these things consistently leads to a higher Clutchness rating for a team, and their opposites contribute to a lower rating.

To create these ratings, we started with an extension of the pbp_scoring_log view mentioned in last week’s discussion of Score Control, which gave us an ordered log of scoring plays from each team’s perspective. From there, we wrote a query to get each team’s score at a given “cutoff time,” which we set to five minutes left in regulation for our purposes (you can see that the number is actually set as a “parameter” at the top of the query, which means it could be easily modified for future uses), and then merged with the final score and game result.

Snippet of SQL Code to Get Score at Given Time in Game Using Play-by-Play Scoring Log

This gives us the data we need for the five-minute mark of each game, and we could’ve gone from here directly to a team rating. But we wanted an additional piece: a measurement of opponent strength. As you might imagine, it’s harder to come back from a three-point deficit in the last five minutes against Virginia than against Chicago State, and this should be taken into account. While we could’ve derived these strength measures directly from scoring changes in the last five minutes, it’s not ideal since we’d only be considering a small portion of each game (they play at least 40 minutes, after all) and not necessarily competitive situations (the last five minutes of some games are “garbage time”).

We needed a more robust measurement of how hard a team was to beat, in general. To handle this, we created an “adjusted win percentage” for each team based on final results (win or loss in each game). The procedure is similar to the regression-based schedule adjustment we discussed for various metrics in this post, except that we used a ridge regression classifier since we were dealing with binary outcomes.

These adjusted team ratings get merged into our game-level data set (for both teams in the game) and are used as features, along with the score differential and home-court advantage, in our model for win probability at the five-minute mark. These team features are important here because they separate the effect of the actual score margin at five minutes from which team is more likely to have that score margin. Better teams simply will more often be leading at the five-minute mark. So when we see a high “raw” win percentage for teams with five-point leads from that point on, it could be attributable to the lead (i.e., game-situation specific), or to the team just continuing being better/worse than its opponent (i.e., more related to the teams involved than the situation). We wanted to separate these, to some degree, for this particular use of our model.

To fit this “five minutes left” win probability model, we use logistic regression from scikit-learn. There’s a cottage industry of sports win probability models that estimate a team’s chances of winning at each play within a game, and those can get much more complicated by trying to account for various other factors, non-linear effects, and so on. For our purposes, we were only looking at win probability for a single time point in the game — five minutes left in regulation — and using it as a baseline for the chance to win going forward. While the model could be improved, having some estimate makes our method more robust than most simpler methods that have no such accounting for the state entering the end-game.

With the model thus fit, we then needed to use it to produce probabilities. In this case, we wanted to estimate the chance that an average team would come back given the situation. You might be wondering, why use the average team instead of the team itself in calculating that probability? Great question! In order to rate a team’s performance in the clutch, we needed to compare what it did in the last five minutes to some baseline of what a typical team might do given the score, site, and opponent.

For instance, if we put Gonzaga’s strong team rating into the model and found that they have a 90% chance of winning when tied with five minutes left against a good opponent — even though this is “right” from the game perspective — we’d only be crediting them for +10% win probability added from a Clutchness rating perspective. Meanwhile, a lesser team accomplishing the same feat against the same opponent might get +50% (or more). It doesn’t make sense to count Gonzaga’s performance less for this evaluation simply because they are Gonzaga, so we use the average team to get a baseline of expected chances at the five minute mark. We still use the actual opponent rating, since we want that to be factored in, but at least this way we can calculate the change from that win probability to the final (0 for a loss, 1 for a win) to get a more accurate Clutchness score for Gonzaga in that game.

We show the code for switching out the team dummy variables for 0s (representing the average team), calculating the subsequent “average team expected win probability,” and then calculating wins added above that expectation, in the snippet below.

Colab Snippet for Switching from Specific to “Average Team” Perspective, Then Getting Win Probability

Clutchness in Action

Let’s see how this works with a few examples, starting with an NCAA Tournament game from last weekend (we picked a different close one so as not to further alienate UCF fans):

Against Maryland in the Round of 32, LSU controlled most of the game but gave up its entire lead and actually faced a three-point deficit entering our “crunch time” cutoff at 35 minutes. Since Maryland is pretty good and any margin is hard to overcome against a good team, an average team would be given a 15% chance for the win. But the Tigers rallied like they have frequently this season (more on that later), winning on the strength of Tremont Waters’ twisting game-winning layup with 1.6 seconds left.

So LSU gets +0.85 win probability added for pulling out that win — a hefty total. Here’s another example of an even bigger comeback from earlier this season:

UTSA trailed Old Dominion by 16 at the five-minute mark before rallying for an amazing comeback win. Old Dominion is actually an above-average team, too, but this isn’t an opponent adjustment story: coming back from 16 down (actually 18, as ODU scored another basket with 4:43 remaining) in the last few minutes is impressive against anybody!

For this game, UTSA gets +0.998 win probability added toward its season Clutchness rating — going from a nearly guaranteed loss to a win in the last five minutes. That’s the largest single game value this season. On the flip side, Old Dominion gets essentially one full win docked from its Clutchness — giving up a 16-point lead in the last five minutes is very not-clutch.

So we can see how this approach captures the clutch nature of great comebacks, even extreme ones. One more example, going back to last weekend, illustrates how the win probability-based approach can also help with accounting for seemingly close (but less down-to-the-wire) games, as well:

In their Round of 32 game on Sunday, Liberty hung tough for a while but was down five heading into the final five minutes. The Flames couldn’t overcome the deficit and wound up losing by nine. This would qualify as a close game by some “close-and-late” definitions, and as a result this would count as one loss toward Liberty’s record in those games.

But considering the situation, how much should we dock Liberty for not being “clutch” enough to win this game? According to our model, an average team down five with five minutes left against Virginia Tech (a very good opponent) only has a 7% chance of coming back to win. The loss is much more attributable to the situation Liberty found itself in after 35 minutes than what happened in those final five minutes. So for our Clutchness rating purposes, Liberty gets a -0.06 WPA for this game — a debit, but not a large one, since we account for the situation better.

To go from these game-level ratings to the season level, we simply add up each team’s (schedule-adjusted) win probability added in the last five minutes across all its games. We don’t have to decide whether a particular game is “close” or not — every game has a chance to count towards a team’s Clutchness rating, based on win probability movement in those five final minutes. We get a more thorough and less biased measure of how much a team contributed to winning in its performance down the stretch of games over the course of the season.

Here are the Top 10 teams in our Clutchness rating based on games through March 26:

LSU, who we mentioned earlier, comes in at #1 with more than six wins added based on performance in the last five minutes. We examined the latest example against Maryland, but their late-game heroics go much beyond that. Call them clutch or lucky, but the Tigers have had six other wins in which they were either tied or trailing 35 minutes in, all against SEC opponents (and five of those were on the road). Some of these were truly memorable: overcoming a 14-point deficit in the final 2:08 at Missouri, winning on two free throws in the final second to cap a rally against Tennessee, not to mention the controversial tip-in at the buzzer to upset Kentucky (a game in which LSU actually led by three with five minutes left). LSU earned at least +0.60 WPA for each of those wins and also did a nice job finishing off other close (but less dramatic) games, earning their top spot.

Not too far behind is Duke, who also moved up based on last weekend’s heroics. The Blue Devils are “only” 5–3 in close games (based on the “tie/lead change” definition), but got nearly a full win of Clutchness credit from their historic comeback at Louisville. They’ve been pretty good at coming away with victories in tight games against very strong competition, with Sunday’s dramatic win against UCF (+0.678 WPA) being the latest example.

UNC Greensboro may seem a bit out of place, as they are not a tournament team nor do they have a series of crazy comeback wins. But the Spartans have actually been amazing in winning close games this year: 11–1 in games with a tie or lead change in the final five minutes, which is among the most impressive records in the country. For further confirmation, they sit at the top of KenPom’s “Luck” ratings — again, one person’s “clutchness” is another person’s “luck.”

You can sort, filter, and scrutinize the Clutchness ratings for all 353 Division I teams on the “Bracket IQ Metrics” page of our March Madness dashboard. Unlike with some other metrics, you can expect to see some significant movement in Clutchness in the Sweet 16 and beyond (somewhat by design), so come back and check throughout the rest of the tournament.

Ideally we’ve made a good case for Clutchness as a retrodictive measure of how good a team has been at winning close games down the stretch — it’s good at looking backward and summarizing teams’ clutch play in the past. But you might have noticed we still haven’t answered the important question of if clutchness is “real” or not! Should we just assume Duke and LSU are in better shape than everyone else in any close games they play the remainder of the tournament? Or is there more we can find in the data?

Stay tuned for Part 2….

--

--