Has This Been the “Maddest” March?

Authored by: alokpattani@google.com

Elissa Lerner
Analyzing NCAA Basketball with GCP
5 min readMar 29, 2018

--

If you think this has been a wild NCAA Tournament so far, you’re not wrong. To recap just a few of the highlights:

So yeah, it’s been a crazy one. But it’s not called “March Madness” for nothing — isn’t the NCAA tournament often this unpredictable?

That’s where the data comes in. Let’s use the NCAA basketball data set, BigQuery and Data Studio to see if we can put some metrics to the Madness.

Average Tournament Wins By Seed

We start by using the NCAA tournament team seeds (1–16) as a way to group teams together and measure how they “should” do in a given tournament. Granted, seeds are not perfect indicators of team strength — there are definitely cases where teams with worse seeds are stronger than (i.e. would be favored against) better-seeded ones. In general though, seeds can be used as a rough proxy for how far a team can be expected to advance in a given tournament.

Since the tournament expanded to 64 teams back in 1985, no. 1 seeds have won an average of 13.3 games per tournament, compared to 9.4 wins per tournament for no. 2 seeds and so on, all the way down to one win in 34 years for no. 16 seeds (we see you, UMBC). The plot below shows average wins by seed per tournament (we exclude “Opening Round”/“First Four” games in this and all subsequent analysis).

You can see the steep decline in average wins from 1-seeds through 5-seeds, a bit flatter trend from there through 12-seeds, and then pretty meager performance in the 13–16 range. This curve serves as a baseline for what an “average tournament” looks like in terms of wins by seed in the 64-team era.

2018 vs. 2016

With our curve established, we can compare individual tournaments to the baseline to see which tournaments look more or less typical. Check out the top section of this interactive Data Studio dashboard to see what we mean. Below is a look at the 2018 tournament (at time of writing), with the red bars representing wins by teams with the given seed this year, compared to the same average trend curve in blue.

As you might expect, this looks a bit chaotic — the red bars are all over the place with respect to the blue line! As a group, 9-seeds (thanks to Kansas State and Florida State) and 11-seeds (beyond the Ramblers, Syracuse also made the Sweet 16) vastly outperformed expectations. Top-two seeds underperformed (and this will be true even if Kansas or Villanova wins the title), as did the 4-seeds (Arizona and Wichita State were upset in their first games).

Compare this to the distribution of wins from the 2016 NCAA Tournament, where only two teams seeded worse than seventh made the Sweet 16 and all four 1-seeds made the Elite Eight.

Sure, there is some underperformance in the 3–6 range and over-achieving by 10- and 11- seeds, but most seeds were within a win of the baseline. In other words, 2016 was a fairly “typical” tournament — by seed, anyway.

Putting A Metric to the Madness

From this perspective, it’s fairly clear that 2018 has had more bracket “madness” than 2016 did. We can browse through the other years individually to see how they stack up against the average (and we encourage you to do just that, using the interactive dashboard), but how can we turn our visual test into a single metric to describe just how typical or atypical a given tournament was?

One relatively straightforward way to do this is to calculate the distance between the red bars (actual wins by seed) and the blue line (average wins by seed), then average the absolute distance across all seeds by year. The result is the average win difference vs. seed expectation across the year, with a low value representing seed win totals in line with the averages (like in 2016), and a high value representing seed win totals further from the baseline (like in 2018).

We can calculate this distance metric for each of our 34 NCAA Tournaments, then plot the value by year as you see below (and in the bottom plot of the interactive).

The tournaments that went relatively to form by seed — like 1996, 2006, 2012, and 2016 — have values just below one, meaning each seed was within one win of its expected win total, on average. The “less typical” tournaments have values up near two, and only one year has a value above that mark, which, you guessed it, is this year’s tournament!

With an average win difference vs. seed expectation of 2.1, the 2018 NCAA Tournament ranks above all others by this metric. The number will drop a bit if Villanova or Kansas wins the title, but even so, 2018 will still be the highest among all years, ahead of 2000 (two 8-seeds and a 5-seed in the Final Four) and 1999 (9- and 10-seeds combining for as many wins as 2- and 3-seeds). So yes, college basketball fans, what we are witnessing now is a truly historic tournament!

To be fair, there are definitely other ways to approach this question — you might decide to account for the ordering of the seeds more explicitly by weighting later round appearances more heavily, and so on. While 2018 will still fall on the “more unlikely” side of most such metrics, they won’t all have this year as the most atypical tournament, as our metric does. But with the historical tournament games data set at your disposal, we encourage you to try out some of your own methods to quantify the madness!

We’re all mad here.

--

--