Application of Logistic Regression to March Madness (w/ Threshold Optimization)

Aidan Gibbons
7 min readMar 21, 2024

--

Last year I wrote a logistic regression model with LASSO regularization to predict March Madness results, linked here. For context into this current article, I highly recommend a read (it’s also frankly the more interesting and insightful article of the two…)

Reflections on last year’s model

Last year, I was woefully unsure of how the modelling endeavor would go, considering how difficult it seems to solve a system as complex as the NCAA tournament with limited data. Given this complexity, and the tendency for upsets, often background knowledge on the league and teams isn’t very fruitful. A fun example — my old cat, who would (with a bit of help from his owners) pick a combination of teams with high seeds and cat (or other animal) nicknames. Despite this seemingly naive strategy, he would routinely outperform professional sports analysts on ESPN (and regretfully, myself as well…)

However, the model was a success. Using mainly the 2010 model to guide my decisions, I was able to pick multiple key upsets, landing me the 2nd place position in my office pool (our of 75).

  • The major Cinderella in the final 4 was Florida Atlantic University, which I had predicted to reach the elite 8. As a 9-seed, wins over 1-seed Purdue and 4-seed Tennessee were unexpected, but without even having to make any judgement calls regarding the probabilities, this model had FAU winning.
  • In the sweet 16, San Diego St defeated top seeded Alabama. My model had SDSU winning the matchup with a 41.4% probability, so given how close this was, as well as an affinity for the novelty of upsets, SDSU was chosen as the victor.

The model liked these two teams to beat expectations due to a solid combination of win % and strength of schedule, as well as having solid performance in some other predictors (FAU having one of the lowest opponent assist %’s in the country).

2024 Analysis

Idea of Threshold Analysis

Overall, I did not make many improvements to the model from last year in terms of adding/engineering more predictors (which I would love to do eventually), but one thing that I did want to make a bit more concrete though was the practice of choosing an upset if the higher-seeded team has a lower probability.

For binary classification tasks, predictive models will sometimes be quite boring, especially when there is an expected result. For instance, let’s say I am creating a model to predict whether someone has a certain disease based on their medical history. If I use a default threshold of a 50% probability to predict whether they have the disease, there will likely be a high false negative rate. In a situation like this where a false negative is more costly than a false positive (such as is the case with disease classification), we can shift the threshold to optimize. In the case of march madness, many scoring systems have an extra bonus for picking an upset correctly (or if they don’t, such as ESPN, they absolutely should). Using a threshold higher than 50% of whether the favored team will win makes it more likely for the upset to occur. Additionally, it improves the odds for a perfect or near perfect bracket, considering a large number of upsets occurs in nearly every year in the past.

Last year, these thresholds were mainly arbitrary loosely based on historical seed performance expectations. For instance, if on average one of the top 16 teams loses in the first round, I’d pick the highest likely upset among those (in last year’s case, Kent St over Indiana) despite the fact that Kent St still was only given a 45.4% chance to win per the model. Another example of an arbitrary decision was the SDSU win over Alabama in the elite 8. My question was — how could I standardize a way to pick these thresholds?

Threshold Optimization Methodology

My approach was to use the model on past data, calculate all hypothetical matchups, and match the probability of an upset for a given seed matchup with the chance an upset occurs. For example, take the 1 vs 16 matchup. For each year, there are 4 matchups that actually happen, but 16 hypothetically possible matchups for which we can run the model on and see what it predicts. Over 13 historical years, this yields 208 matchups. Since 1985, the 1 seed has won the matchup 98.7% of the time. The 98.7th percentile of the 208 matchups corresponds to the 205th largest upset probability among my simulated matches, which is 93.0%. Hence, using this threshold analysis, the model would actually choose a 16 seed over a 1 seed if the 1 seed was given a 92% chance of winning by the model.

Below is a table of the thresholds as well as the corresponding probabilities:

8 vs 9 seed excluded because probaility of win is roughly 50/50

And below is the model output for round 1:

The teams highlighted in red are upsets strictly chosen by the model, whereas the teams highlighted in yellow are the ones that now lose due to the threshold optimization. For instance, the model predicts Gonzaga will have a 63.2% chance of winning, but since that is under the threshold for the 5–12 matchup, instead McNeese advances. Some other upset that now occur are some of the narrow 6–11 and 7–10 games (Drake, Colorado, Colorado St).

For the second round, I the thresholds were determined differently (an approach which is a bit more wishy washy) — instead, by taking the probability that a given upset-seed (6 and up) advances to the sweet 16 and determining the corresponding model probability. For instance, 6 seeds advance to the sweet 16 28.9% of the time, and of the 208 modelled matchups between 3 and 6 seeds, a 42% chance the 6 seed wins is in the 28.9th percentile, so 42% is chosen as the threshold. A table of values is below.

And below is the output for round 2:

There is only one upset chosen by model (Texas Tech over Kentucky), but four additional ones from the threshold analysis. And behold, we have our main sleeper double digit seeds — Grand Canyon, New Mexico, Oregon, and Colorado St.

Threshold for future rounds were calculated similarly, but only a few that were needed based on the above matchups were calculated.

And the model results are below:

The only upset forced by the threshold was, surprisingly, 11 seeded New Mexico over 2 seeded Arizona. Additionally, the threshold analysis chooses New Mexico over North Carolina to make the final 4, due to the fact that out of all 208 1 vs 11 seed matchups, only 3 have had higher odds of an upset (in one of which, the 11 seed team UCLA actually ended up making the final four). New Mexico happens to perform very well due largely to an exceptional combination of win % and strength of schedule for an 11 seed. (Note that if I were filling out a bracket, if the rules didn’t reward upsets, I would probably not pick New Mexico in the final 4, but instead choose UNC, and then in an effort to not be boring and pick all 1-seeds, have Iowa St over UConn and Iowa St over NC, eventually losing to Houston).

As for applying this threshold analysis to key upsets in the 2023 bracket, there is only a 50% threshold for the Marquette (2-seed) vs FAU (9 — seed), and a 55% threshold for Alabama (1-seed) over San Diego State (5-seed). Because both favorites had higher model odds than the threshold, so neither upset would be picked with this current analysis (though FAU still would have been in the elite 8).

Other Possible Approaches to Threshold Optimization

There are many other ways other than the approach above that threshold optimization can be used in this march madness algorithm; by no means is this quick method nearly sufficient compared to what I’d like to do with more time. One other option would be to write an optimization function based on the points system of the bracket pool. For instance, say you win points based on the round you’re in and the seed you picked to win in that round (i.e picking 12 seed winning in round 1 yields 12 points). Optimizing only for the first round, you will pick any 12 seed with more than a 5/(12+5) = 29.4% chance of winning, since then your expected points will be higher than picking the 5 seed. However, this short-term optimization yields issues for earning potential points in later rounds.

--

--