Authors: Anil Timbil, Ayush Upneja, Parker Van Roy, Daniel Zhou
Let’s talk about explosiveness.
You’d think college basketball was positively pyrotechnic lately, given how often we hear about how “explosive” a team or a player is. On the surface, you might think about a team’s or player’s proclivity to dunking. After all, even if you hate Duke, it’s hard to argue that Zion Williamson’s dunks are anything less than dynamite.
But while dunks are fun to watch, how useful are they? Could they be an indicator of explosiveness? For us, that question turned out to be just the tip of the iceberg.
A few of us had started investigating dunks back in January during the Google Cloud & NCAA Hackathon at MIT. We’d wanted to find out if dunks had any demonstrable effect on the energy or momentum of a team — something often assumed to be true, but rarely (if ever) exposed with data. Short answer: they do!
We used the historical play-by-play dataset to identify dunks, and counted the ten shots prior to and following the dunk in question to determine the change in score over time. We also created a model to investigate the numerical difference in occurrence of aggressive plays (offensive rebounds, steals, etc.) in the time around dunks as well. Our models offered one possible framework for analyzing how different types of plays affected momentum (M), energy (E), and a way to understand a game’s acceleration as a function of those two concepts. We found that dunks actually contributed to about a 15% increase in game acceleration — a much bigger effect than we’d anticipated.
You can see the initial model here:
α = Shots made by Team A after dunk
β = Shots made by Team B after dunk
Where α + β ≤ 10
If Team B calls timeout during set, M = 100
r = ORB/min in 3 min after dunk
R = ORB/min in game
s = Steals/min in 3 min after dunk
S = Steals/min in game
b = Blocks/min in 3 min after dunk
B = Blocks/min in game
Where E ≤ 100
Acceleration = 0.7M + 0.3E
Nice as dunks and game acceleration may be, there’s only so much that they can indicate about a team’s performance. We realized what was more interesting was the thing they were indicating: a team’s explosiveness. To find out what else might indicate explosiveness, we’d have to dig deeper into the data.
Find the runs…
For our purposes, we think of explosiveness as moments in a game when a team starts dramatically outscoring their opponent for some period of time, a.k.a., going on a run. We wanted to tie explosiveness to some kind of measurable increase in score differential for a team over a specific period of time.
There were two problems to solve in trying to define a run. First, we had to figure out what scoring anchors would constitute a run. Second, once we settled on the scoring anchors, we’d have to come up with a way to account for situations when scoring stretches qualified the team for a run, but occurred in overlapping segments of time.
We started by defining a run as: a span of time where one team scores at least X, and the other team scores at most Y. After looking at the distribution of all runs and how frequent certain runs occur, we decided to set X to 12 and Y to 5 (which popped up as anchors to the rarest 10% of runs, and also happens to track nicely with under-2 and 4+ possessions for each team). We’d start finding runs by looking at slices of the game events. Whenever any game event occurs (i.e., a shot, steal, rebound, etc.), it receives a unique row in our play-by-play dataframe with a timestamp. Between any two times, we looked at how many points the home team scored and how many points the away team scored and we determined if it should be classified as a run or not.
Let’s look at an example: Princeton at Duke, on 2018–12–18.
The score differential is calculated as the total points of the home team minus the total points of the away team. In this example, this means that Princeton was leading Duke by 8 points 334 seconds into the game, but Duke was leading Princeton by 13 points by halftime, and ultimately winning by 51 points in the end. Duke certainly looks like it had strong runs, but let’s quantify it.
Here’s a slice of the play-by-play:
From the start of the game to 334 seconds into the game, Princeton scored 13 points, while Duke only scored 5. This would classify as a 5–13 run from time 0 to 334 seconds for Princeton, where the home team’s points scored is the first number and the away team’s points scored is the second number. We cannot classify the next time frame, 0 to 354 seconds as a run because the score would be 7–13, so Duke would be above our threshold of 5.
Let’s keep slicing the play-by-play to find other runs:
Now we see that from 334 seconds to 855 seconds into the game, Duke scored 14 points (19 minus 5), and Princeton only scored 5 points in the same time span (18 minus 13). This counts as a 14–3 run for Duke. As we keep progressing through the game, we see that 334 to 875 seconds counts as a 16–5 run for Duke, 334 to 905 seconds counts as a 19–5 run for Duke, and so on, including 334 to 946 seconds counting as a 22–5 run for Duke. This continues before stopping at the window of 334 to 985 seconds because Princeton scores. This causes the score differential for this time frame to be 22–8, and 8 is above our threshold of 5.
However, we can also observe that from 354 to 855 seconds into the game, Duke had a 12–5 run, but this time interval was already completely covered by the 334 to 855 second run. This problem will be dealt with shortly. We continue finding these runs for the rest of the game.
Here’s a snippet of all the runs we found:
Below is the full list of runs by scoring type and how frequently they occurred. Our definition of runs yields 149 individual runs during this game. Note: we decided a run cannot continue from the first period into the second (it’s too significant of a break in action), and is therefore separated at halftime (1200 seconds). The same logic applies to overtime — there’s too much time in between action, so a run can’t continue from the second half into the first overtime, first overtime into second, and so on.
We can see that the most frequent run type was 17–5, meaning that there were 12 distinct time spans in which Duke scored 17 points and Princeton only scored 5. There’s also only one run for Princeton — that first 5–13 run.
The graph above overlays each team on the score differential chart. Each run is colored in with the team’s color, so we can see Princeton’s run marked in orange at the start of the game, and Duke’s runs marked in blue. But this graph is a bit hard to read, so let’s fix the overlapping runs.
If a team has multiple runs that overlap with each other, meaning one run starts before another ends, they are combined. As a result, we are left with three sequential runs.
Ah, much cleaner! Now that we knew where to find runs, we turned to BigQuery and Colab to see how we could put this information to work.
…Measure the explosiveness
Our main assumption was that the faster the score differential increases in a run, the greater the explosiveness of the team on the run. However, without additional considerations, explosiveness would be skewed toward shots made in immediate succession, or would collapse if the opposing team held the ball indefinitely. So we had to come up with some modifications.
Since we’re interested in explosiveness as a function of scoring in a run, we intentionally focused on shots and offensive rebounds. We started by creating a measure for the speed of change in score differential that could account for time. Lots of points in a run is better than a short run with a slightly higher rate of points/time, but we still want to be sure to value faster runs as more explosive. We started with the following:
- SD_f is the score differential after the run, and SD_0 is the score differential at the start of the run. SD_f is always greater than SD_0 during a run. The larger the difference, the more explosive the run.
- T_f -t_0 measures the duration of the run in seconds. The less time, the more explosive the run.
Then we needed to account for the opponent’s attempt to stop the run, which we expressed as follows:
Remember, the delta of score_team is the amount of points the team with the run has scored, which is at least 12. Delta score_opponent is the amount of points the opponent has scored, which is five or fewer. The more points the running team scored and the fewer points the opponent scored, the more explosive the run.
The product of these two pieces would become one component of our explosiveness equation. Call it E_1:
Finally, we wanted to reward shot accuracy and offensive rebounds as a measure of a team’s offensive shooting efficiency on a run (We thought about including a defensive rebound weight as a counterpoint, but opted to focus on the offensive end of the run alone). Let’s call this E_2:
All together, we consider explosiveness to be the product of weighted speed of scoring and opponent’s stopping power plus a weighted value for shot accuracy:
So let’s take a look at some notable runs this season. How explosive were they?
Here’s the explosiveness measured for the Princeton at Duke game we used before. Though we only need three points to define each team’s explosiveness, let’s include the scores for each of the individual overlapping runs to see how explosiveness can change throughout the main run. In this way, we can see that Duke’s runs were definitely more explosive than Princeton, with the last run being over four times as explosive as Princeton’s first run!
Let’s look at another game, Duke at Louisville, played on 2/12/2019. In this game, Duke made a historic comeback in the last 10 minutes.
We can see that Louisville’s two runs to reach an impressive 23-point lead had max explosive scores of about 25. However, Duke’s one run to win the game in the last 10 minutes had a max explosive score of about 50 — almost double that of Louisville’s. Pretty fun to calculate!
But when sportscasters are calling a game and declaring them “explosive,” they’re not running the play in BigQuery against the greatest runs and the most explosive games of all time to see how they stack up. But we can! So let’s look at a high-profile “explosive” game in recent memory to see how our work checks out. Texas A&M’s unbelievable comeback from a 10-point deficit in the final minute against Northern Iowa in 2016, seems like a good candidate. Because of this literal last-minute run with an explosive score of about 27, the game went into overtime and eventually resulted in Texas A&M’s victory. Northern Iowa was unable to avert the crisis that happened in a matter of seconds.
It’s also interesting to note that a team can be more explosive than its opponent but still lose the game. Here’s an example: Arkansas St. at Little Rock, played on 2/2/2019.
Little Rock was on their way to making a great comeback after a 22-point deficit in the last quarter of the game, and as a result was more explosive than Arkansas St. Nevertheless, it wasn’t enough to win the game.
It’s a sprint, not a marathon
Given what we’ve figured out about how to find and determine the explosiveness of a run, we wanted to find a way to rank all the teams this season based on their explosiveness. Should it be based on whether a team generates an explosive run and wins? On the teams with the greatest number of games with explosive runs? The greatest number of explosive runs? The runs with the highest explosiveness scores?
After some consideration, we decided that summing the explosiveness scores of runs in a game was the most rational way to get at a single explosiveness score for a game. After all, you can be on a run only for so long — and the bigger the runs get, the higher the explosiveness score of the game.
For each game, we identified the runs for each team, and with our metric, we found how “explosive” each run is and summed them. Here are the top 20 “explosive” games of this season:
(A note about reading this table: each row is one game. If is_home is 1, then the row is about the home team, and if is_home is 0, then the row is about the away team. For example, in the very first row, is_home is 1, so South Dakota St. had 3 runs total for this game, and had an explosive sum of 214.)
We wanted to give equal weight to the two components of our explosiveness formula and so we could assign a single, intuitive explosiveness score between 1–100 to a team. But since we noticed we were getting values as high as in the 100s for E_1 and only as high as in the 40s for E_2, we needed a way to fit the values properly. Otherwise, E_2 might be dominated by E_1. We iterated through all games to find the average E_1 and average E_2 per game for each team, and ranked each team in each component separately. The top team in each category was given a score of 50, with all other teams downscaled accordingly. This way, each team would have a total explosiveness score out of 100 that still preserved the seasonal ranking.
Here are the rankings according to E_1 and E_2 side by side:
We can see that there are a few change ups in rankings between E_1 and E_2, but the order is generally similar, which we’d expect.
Put together, here are the top 20 most explosive teams:
We can see that Gonzaga is considered the #1 most explosive team, and by a considerable margin to #2 Duke.
Here’s the distribution of team explosiveness scores:
In this plot, we can see the distribution of explosiveness scores for each team. It appears to be unimodal and skewed to the right. There are 69 teams who have an explosiveness score between 42.5 and 47.49.
For a different perspective, the graph above gives us percentiles on the distribution of explosiveness. We can see that 25% of the teams had an explosiveness rating below 36.5, 50% were below 43.36, and 75% were below 49.77. We can also see that any team with an explosiveness metric past 69.64 can be classified as seriously explosive (for those still following along, that would be the top 10 from our list above: Gonzaga, Duke, Buffalo, Wofford, Michigan St., Belmont, North Carolina, Virginia, Murray St., South Dakota St., and Tennessee).
Now that the tournament is well underway, we thought we’d take a look at how some of the most dramatic games of the first two rounds shook out in explosiveness. Let’s start with Virginia vs. Gardner-Webb, where at least for a little while, it looked like Gardner-Webb was going to make Virginia relive its painful memories from 2018. What kind of explosiveness did it take for the Cavaliers to get back on track?
We can see pretty clearly where Virginia takes off, going on a grand cumulative run for the entire second half of the game. They ended up with six times as many individual runs as Gardner-Webb, and with explosiveness scores of nearly double.
Meanwhile, on the same day, Oregon played Wisconsin and pulled off a pretty decisive 12–5 upset. How did that look by way of explosive runs?
We can see that Oregon was by far the more explosive team throughout the game, with 11 individual runs compared to Wisconsin’s one; and all of their run scores were higher than Wisconsin’s. And just in case you’re still surprised about this upset, you should know that Oregon ranked #57 in our explosiveness ratings of all 353 D-I men’s teams.
Wisconsin? Barely in the top half, at #142.
Our working definition of explosiveness is just one, of course. In creating it, we dug into the nature of runs, their explosiveness, and how that translates to team explosiveness. We also came up with a way to rank teams based on their explosiveness, and looked at a few interesting games and teams from this perspective. While explosiveness is not the only way to win, it does keep things exciting! And while we’re not saying explosiveness alone is predictive of game outcomes, we’ll be continuing to investigate how this metric affect team performance.
See you in the Sweet 16!