The March Madness Selection Committee Is Outdated

(And how we should fix it.)

Connor Groel
Top Level Sports
18 min readMay 9, 2023

--

Photo by Jacob Rice on Unsplash

I am a bracketologist.

Every Selection Sunday since 2017, I’ve released my final projections for which schools will make the 68-team field for the NCAA Division I Men’s Basketball Tournament and where they will be seeded. In 2023, I began doing the same for the D-I Women’s Basketball Tournament.

The goal of bracketology is to forecast the selection committees’ decisions as accurately as possible. However, if it were up to me, there wouldn’t be selection committees at all.

College basketball’s current process of gathering a dozen people (athletic directors and conference commissioners), throwing them in a room, and having them, over a period of several days, determine what the March Madness bracket looks like, sounds ridiculous.

That’s because it is. Especially in the year 2023, when we are perfectly capable of having a simpler, more objective process, it’s time to get rid of the selection committees once and for all.

The selection committees are frequently criticized by fans and the media. Likely the most common criticism is over a supposed bias towards power conference schools and particularly the biggest brand names in the sport.

There are also complaints over the competitive balance between regions of the bracket and conspiracy theories over certain matchups being specifically arranged with TV viewership in mind.

Each year, when the bracket is revealed, there are typically one or two decisions that I strongly disagree with. That being said, I actually think that given the information at their disposal, the committees do a pretty good job.

The NCAA has improved the quality of the data used by the committees by swapping the outdated RPI for the NET ranking as the primary sorting tool.

On the men’s side, they’ve added a variety of advanced metrics (KPI, Strength of Record, KenPom, BPI, Sagarin) to team sheets and developed a quadrant system that incorporates game location into how schools’ schedules are evaluated.

(It must be noted that none of the advanced metrics listed above exist on the women’s side and that the women’s “categories” system uses different cutoffs for wins and does not take into account game location like its quadrant system counterpart in the men’s game. Reforming the categories system and adding advanced metrics such as the one used by Her Hoops Stats are essential short-term improvements to the women’s selection process.)

There is also more transparency about the selection process now. The official procedures used by the committees are publicly available and midseason bracket reveals provide opportunities to see where things stand before Selection Sunday.

The fact that the average member’s score on Bracketmatrix.com (a website that tracks the accuracy of men’s bracketologists across the internet) has been steadily increasing over time also suggests that at least that committee is becoming more consistent and predictable.

These are all positives. However, they do not reflect a good overall system — just a poor one being refined.

The 7 Main Problems With the Current System

1. The Committees’ Job Is Too Complex a Task for Humans

The truth is that creating the March Madness bracket is a very hard thing to do fairly. If humans were able to perfectly ascertain the strength of every college basketball team’s resume, there wouldn’t be an issue with continuing to use the committees.

But that isn’t the case, and **spoiler alert**, we have better methods available to us.

Not only is it difficult to balance the various selection criteria (quality wins vs the number of opportunities a team has to earn them, bad losses, overall and non-conference strength of schedule, resume and predictive metrics, performance in road/neutral-site games, etc.), but often times, resumes don’t even look remotely similar to one another.

For an extreme example of this, let’s consider the quadrant records of Charleston and Rutgers in the 2023 Men’s season.

Charleston entered the Selection Show as one of only two teams with at least 30 wins over Division-I opponents, while Rutgers had 11 fewer victories. The way each team reached their record couldn’t have been any more different.

How do you compare a school with just three games against teams in Quadrants 1 and 2 with a school that has had 20 such matchups?

Charleston hadn’t beaten anyone ranked in the top 50 of the NET while Rutgers had seven such wins — including a victory over Purdue, who would be given a 1-seed in the NCAA Tournament.

Yet, Charleston had absolutely dominated their schedule while Rutgers was only 12–14 against Quadrants 1–3, including an appalling 2–4 against Quad 3.

In this particular instance, the committee didn’t need to directly compare these teams as Charleston won their conference tournament and earned an automatic March Madness bid.

The Cougars were a 12-seed (indicating they would not have received an at-large bid) while Rutgers was the biggest surprise among teams that missed the NCAA Tournament.

The quadrant and categories systems are only necessary in the first place to help humans make sense of a team’s results.

Placing games into groups for the sake of evaluating and comparing resumes is much easier for us than trying to assess entire schedules at once.

Where a game falls under each system depending on opponent NET ranking.

Yet, by doing this, we create arbitrary cutoffs that artificially increase or decrease the perceived value of wins and losses. There is very little functionally different between playing the 50th and 51st-ranked team in the NET on a neutral floor.

However, in the men’s game, the former is a Quad-1 contest while the latter falls into Quad-2. In the women’s game, these are in the second and third categories, respectively.

I’m sure the committees realize how similar these games are, but the nature of how they’re listed on team sheets has to have some influence on the decision-making process. That’s why the groups exist.

This is additionally an issue considering how games can shift between quadrants or categories based on the results of unrelated matchups that cause movement in the NET.

Particularly in the men’s game (although a similar concept applies in the women’s game), Quad 1 and 2 wins are commonly seen as “quality wins” while Quad 3 and 4 losses are seen as “bad losses.”

Since resumes are often viewed in the context of which teams have more quality wins and fewer bad losses, this mindset has a tendency to reward teams who face more Quad 1 and 2 opponents (more chances for quality wins, fewer potential bad losses) while punishing those who play more games against Quads 3–4 for the opposite reasons.

In particular, it leads us to downplay the strength of Quad 3 wins, which often still come against above-average opponents and occasionally even fringe bubble teams.

A team like Charleston in the previous example would be critiqued for its pair of Q3 losses while its 12–2 overall record against Quad 3 is actually better than what you’d expect for an average bubble team (more on this later.)

When it comes to selecting teams to make the NCAA Tournament, margins are often extremely slim. While the committees do the best they can, it’s easy to overvalue a glaring win or loss when overall bodies of work are hard to choose between.

2. It Turns Scheduling Into a Strategy Game

If earning quality wins is an important part of being selected to play in March Madness, then naturally, schools will want to schedule more games against high-quality opponents.

The issue here is that opportunities to face these opponents are not equal. Obviously, the greatest differences come in conference play, as teams from high-major conferences (ACC/Big 12/Big East/Big Ten/Pac-12/SEC) are significantly stronger on average than those from mid and low-major conferences.

High-major conferences themselves have worked to further their advantage by increasing the length of their conference schedule (the ACC, Big Ten, and Pac-12 have all gone from 18 to 20-game men’s conference schedules within the last five years) and partnering with networks to create scheduling series such as the ACC-Big Ten Challenge and Big East-Big 12 Battle.

These increased matchups between high-major schools, along with those found in the more prestigious early-season tournaments (which typically feature more consistent programs with larger brand names) come at the expense of smaller schools.

Coaches from more competitive mid and low-major programs regularly voice displeasure about their difficulties getting bigger schools to schedule games against them. When these games do occur, the power conference school almost always gets to play at home.

This makes it extremely challenging to accumulate quality wins and often forces these teams to finish with gaudy records to even crack the at-large conversation.

And while they receive significantly less sympathy, there are also bubble teams from power conferences who are left out of the field seemingly because of their poor non-conference scheduling. A non-conference strength of schedule in the 300s is seen as an especially large red flag.

But why is that the case? March Madness selection should not be based on who you played — it should be based on how you performed against your schedule. Any system that doesn’t abide by this idea will tend to favor the haves over the have-nots.

Plus, it’s impossible to predict exactly how difficult a schedule will be in the offseason, anyways. Teams are always better or worse than they’re expected to be. Upsets happen. It’s why we play the games.

There are so many things scheduling should be about. Optimizing for NCAA Tournament inclusion isn’t one of them.

Under the current system, schools that can sustain success in mid and low-major conferences often ended up switching to more competitive leagues. To a certain degree, that’s to be expected. But it shouldn’t feel like a necessary step to reach the postseason.

3. The Quadrant/Categories Systems Are Fundamentally Flawed

I’ve already addressed a few problems with these systems — namely, that the categories system does not account for game location and that both formats create arbitrary cutoffs when evaluating resumes.

However, while these are more obvious issues, they overshadow the biggest issues with the quadrants and categories.

These methods attempt to place games into groups based on how difficult they are to win. Yet, they fail to even do this correctly. Understanding why requires a brief explanation of the differences between advanced metrics.

There are two general types of advanced metrics for college basketball, each attempting to answer a different fundamental question.

First are “resume” or “results-based” metrics, which look at a team’s win/loss results and evaluate how impressive their record is given their schedule. On men’s team sheets, examples of these are ESPN’s Strength of Record and KPI.

Then, there are “predictive” or “quality” metrics, which look at a team’s efficiency on a per-possession basis to evaluate how good a team is. On men’s team sheets, these are KenPom, ESPN’s BPI, and Sagarin.

Resume metrics are backward-facing. They answer the question, “What have you done?”

Predictive metrics are forward-facing. They answer the question, “What can you be expected to do?”

For an example of how each type of metric works, let’s say Team A and Team B both play the same five opponents with the following results.

Team A won all of their games, but each came in a tight contest. On the other hand, Team B dropped a pair of close games but won three games in dominating fashion.

A resume metric would rank Team A ahead of Team B based on their record against an identical schedule. But if you had to bet on a team to win a game moving forward, who would you choose?

Predictive metrics would go with Team B by a fairly strong margin (assuming these games had relatively similar numbers of possessions).

The quadrant and categories systems should group games based on how difficult they are to win, which would be done by evaluating opponents using a predictive metric.

However, these systems are done using an opponent’s NET ranking. NET is a hybrid metric using both results-based and predictive components.

Largely due to the incorporation of the results-based portion of the formula, there are often situations where a game’s difficulty does not line up with where it is grouped.

As of early March of this year, 47% of possible Quad 1 games on the men’s side were easier to win than a Quad 2 game at Villanova, while 53% of Quad 2 games were harder than a neutral site Quad 1 game vs Oral Roberts.

The placement of the cutoffs between quadrants also likely factors into this problem as well.

The quadrant and categories systems play a major role in the decision-making processes of the selection committees. Because these tools are flawed, the committees’ judgments will be inaccurate even if their reasoning is sound.

4. Metrics Are Double-Counted on Team Sheets

When it comes time for the committees to compare schools’ resumes, they do so primarily by referring to the team sheets, which place many of the selection criteria on a single, easy-to-digest page.

An example of a men’s team sheet, from the NCAA’s RPI Archive.
An example of a women’s team sheet, from the NCAA’s introduction video to the Women’s March Madness selection process.

Ideally, each metric or piece of information considered by the committees would be represented separately. But if we examine what each conveys, it becomes clear that this is not the case.

Strength of schedule (particularly in non-conference play) is one of the most important parts of a resume.

However, while it may be convenient to have that boiled down into a single ranking number, visualizing a team’s SOS is already accomplished through the quadrant/categories systems. Additionally, both the resume and predictive metrics already include SOS when determining how impressive performances are.

It doesn’t make sense for a team’s overall or non-conference SOS ranking to be singled out when that information should already be reflected in other parts of their resume.

(And this doesn’t even mention a major issue with all rankings listed on team sheets — that the distance between positions in rankings is not uniform.)

What we currently have are team sheets that portray the same data multiple times, just in different packaging. This contributes to an illusion of complexity that actually unnecessarily obscures team strength.

5. A Rotating Committee Will Have Differing Values

While I believe the selection committees to be impartial, it cannot be disputed that any human involvement in the selection process always comes with a chance of conscious or unconscious bias.

Part of that is alleviated by the fact that members of the selection committees serve staggered five-year terms. Each season brings new committee members and chairpersons.

The downside of this way of doing things is that the committees will behave slightly differently every year. Every member has their own way of evaluating resumes, valuing certain criteria to various degrees. What proved to be a deciding factor in 2023 may not hold the same weight in 2024.

There is a lack of standardization here, where a perfect system would operate the same way each time while also eliminating any potential bias.

6. The Issue With Working on a Deadline

The committees each operate under time constraints. Both have to finish completing their respective NCAA Tournament bracket before their corresponding Selection Show, where that bracket is announced to the public.

But there isn’t a lot of time to do this, particularly when the final conference championship games tip off just hours before the Selection Shows.

It’s particularly tight on the men’s side, where the 2023 AAC Championship Game began at 3:15 p.m. ET and finished less than an hour before the 6:00 p.m. ET Selection Show.

With so quick a turnaround, the NCAA Tournament bracket needs to be finalized before that game ends, with different contingencies in place depending on the winner.

Does the committee have enough time to effectively do this? The men’s committee has been hit by what I believe are fair questions as to what extent games taking place over the final few days before the bracket is unveiled are factored into the decision-making.

At the very least, it’s impossible for the committee to have the finalized advanced metric rankings at their disposal on Selection Sunday because that would require results from the games taking place that day.

Obviously, this isn’t ideal. Every game should have an equal chance to be considered, and the most updated information is what should be used in the decision-making.

This is challenging when said information is changing rapidly during the period when both committees are meeting.

7. It Isn’t Purely Results-Based

And finally, we get to the real crux of the argument. Many of the aforementioned problems have been centered around the committees not being put in a position to succeed, whether that be given too complex a task altogether, not having the best information, or being forced under difficult constraints.

Yet, this all assumes that the committees have the right goal in the first place. It’s time to challenge that idea, too.

There isn’t a clearly defined objective for the committees. Similar to how the NET is a blend of results-based and predictive components, which teams the committees select (and how those teams are seeded) boils down to a vague combination of which schools are the strongest and have the best resumes.

Over time, it’s gotten easier to predict what that ultimately looks like, but the element of ambiguity still aids the committee. Because there is no official hierarchy of which metrics matter more than others or how to compare wildly different resumes, each decision the committee makes can be justified in some way — even if that way is different every time.

By combining the results-based and predictive, and looking at good wins, bad losses, strength of schedule, and game location, the NCAA attempts to create a rough composite of which teams are “most deserving.”

But if they really want the most deserving teams in March Madness, with a bracket that accurately reflects who had the most impressive seasons, the formula would be entirely results-based.

It doesn’t matter how good a team is — they should only receive a spot if they’ve earned it. To illustrate why, here’s a thought experiment.

Let’s say Team X plays 30 games in their regular season, and for whatever reason, all of those games come against Team Y, which is undisputedly the best team in the country according to predictive metrics.

Now, let’s say Team X loses each of those games by two points. Do they deserve a spot in March Madness?

I think most people would agree that they don’t. After all, Team X had a winless season. But purely looking at predictive metrics, a school that’s only two points worse than the best team in the country is undoubtedly a top program in the nation. They might even be a 1-seed.

For Team X, the question then becomes, how many of those 30 games would they need to win to reach the NCAA Tournament?

A .500 team typically wouldn’t be a part of the at-large conversation. In this instance, though, going 15–15 would clearly be good enough for a top tournament seed.

So, would five wins be enough? How about 10? Without any more information, it’s impossible to say for sure. However, the idea stands that at some point, the team becomes deserving. But that only happens based on them winning games.

I’ll say it again — it doesn’t matter how good a team is. When you step on the court, all that matters is winning. The games have to matter.

Proposing a New System

The ideal March Madness selection/seeding process builds from this idea of figuring out which teams had the best seasons when comparing their record to their strength of schedule.

To do that, I’d like to introduce a metric called Wins Above Bubble, or WAB. The concept behind WAB is pretty simple. It looks at a team and asks, “How many games would an average bubble team win against their schedule?”

It then compares that number to how many games that team actually won. For example, if Team Z won 20 games while an average bubble team would have been expected to win 19 games against that same schedule, Team Z would have a WAB of +1. If Team Z had only won 18 games, their WAB would instead be -1.

You can run the same calculation for every team in the country and then create a ranking of which teams performed the best against their schedules which would be used to select which teams make March Madness and to seed the bracket.

All WAB requires is having a predictive metric that is used as the basis for determining how strong teams are, and by extension, how difficult games are to win. You also need to define the “average bubble team,” which I’ve generally seen referred to as the strength of the 50th-best team in the country.

I’m not the first person to suggest using this type of system. Even KenPom believes we should evaluate teams by how well they played against their own schedule rather than by his own ratings or another metric also based on adjusted efficiency.

There are similar alternatives to WAB that could be considered (including Seth Burns’ Parcells) and there are still a few details about the official WAB formula that would need to be nailed down (such as which predictive metric to use as the basis — I’d prefer a combination of multiple metrics) but this is the general framework.

I think there are two major factors working in WAB’s favor for its potential adoption. First, it’s easy to understand and boils a team’s performance down into a single number.

Second, it’s completely transparent. Imagine knowing every day exactly where your team stands. You’d know if they’re in the current field or on the outside looking in, and by how much. You’d also be able to estimate pretty accurately what a win or loss would do to your team’s tournament hopes or what they would need to climb an extra seed line or two.

This is a future where everyone knows the rules and no one can game the system.

To begin putting WAB into practice, let’s return to the case of the 2023 Charleston men, who entered Selection Sunday with a 30–3 record despite only a 2–1 record against Quads 1–2.

Under the current system, Charleston wouldn’t have received an at-large bid had they lost in the CAA Tournament, and a major reason why would have been their two losses in Q3 games.

However, based on Bart Torvik’s ratings, they had a cumulative WAB of +0.49 in their 14 Quad-3 games, indicating a performance better than you’d expect from an average bubble team.

In total, Charleston finished the regular season with a WAB of +1.7, which was 32nd in the nation, or the equivalent of an 8-seed (whereas they received a 12-seed in the actual NCAA Tournament.)

Using Torvik, I went back and reseeded the entire 2023 Men’s March Madness field based on WAB.

In terms of selection, there wasn’t much difference. North Texas (a 2-seed in the NIT) was the only new team in the field, jumping to a 10-seed, while play-in team Pittsburgh ended up in the First Four Out.

But when it came to seeding, things changed drastically.

Most obviously, both 8-seed Memphis and 9-seed Florida Atlantic (who played each other in the First Round), climbed to 4-seeds in a WAB-based system, proving just how absurd a Round of 64 meeting for those teams really was.

On the flip side, both Iowa and Arkansas fell from 8-seeds down to 12s, with Iowa, in particular, being the final at-large team in the field.

Overall, nine teams saw their seed improve by at least two using this new system. Seven of those were teams from non-power conferences. 11 teams fell by multiple seeds lines, all of which came from power conferences.

This is further evidence that the current system benefits the big names at the expense of mid and low-majors.

Even among the teams that still missed the field, schools like Santa Clara and Utah Valley that weren’t anywhere near the at-large discussion during the season (NIT 6 and 7-seeds, respectively) were just one win away from making the WAB Tournament.

On a personal level, I do bracketology for a number of reasons. The most obvious one is that I enjoy the competitive aspect of trying to be as accurate as possible in my projections.

But beyond that, tracking the movement of teams across the season forces me to pay attention to more teams than I likely otherwise would and adds extra meaning to meaning to games as I understand their NCAA Tournament implications. It also allows me to contribute to a more-informed discourse surrounding the sport.

WAB sufficiently replaces all of these reasons. I would know which teams could make the NCAA Tournament and how each game would affect their chances, and because that information would be public, we would all be much more knowledgeable in our analysis of the season.

I would happily give up the competitive aspect of bracketology in order to know that the truly most deserving teams get to participate in March Madness.

The NCAA has the ability to make that happen. If and when it happens, the world of college basketball will be better off.

--

--

Connor Groel
Top Level Sports

Professional sports researcher. Author of 2 books. Relentlessly curious. https://linktr.ee/connorgroel