Article: Applying Machine Learning to March Madness (Adit Deshpande)

UCLA CS Undergrad shares a walk-through of a simple ML model for bracket predictions

Jacob Younan
AI From Scratch
2 min readMar 18, 2017

--

We’re in Day 2 (ignore the play-ins) of March Madness, and my bracket is momentarily promising. Pre-tournament, I wondered how many brackets were being built using ML models given all the reading I’ve been doing the past couple months.

I’m familiar with FiveThirtyEight’s bracket of round-by-round probabilities, and I’ve used it previously to round out (read: completely inform) my view on a host of mid-major teams. There’s nothing wrong with 538’s models — it’s actually very thorough — but the example I’m sharing here gives you a better look at the logic behind building an ML model.

Adit’s post frames the problem as follows:

  • What’s the purpose of the model?
  • What are the possible data inputs and useful outputs?
  • How do I represent the raw training set data in a usable format (vector)?
  • Which algorithms should I choose to train the model and set the appropriate feature weights?
  • What are the accuracy results and how can I tweak to improve (e.g. reformat training data, incorporate new inputs, explore new model structures, etc.)

I particularly enjoyed how he showed the relative contribution of each feature post-training using a ‘Gradient Boosted Regression Tree’ algorithm.

Source: Adit Deshpande

You can see how certain feature weights are very significant, particularly composite score features like SOS and SRS that take into account the level of competition and performance during the regular season. You can see how these composite scores render insignificant basic game stats like rebounds, threes and points, along with features like ‘PowerConf’ which is a binary for whether or not the team was in a conference with strong competitors.

If you’re into advanced basketball stats — as FiveThirtyEight is — you’d likely select different inputs to feature in your training set (assuming the data is available) like per 100 possession game stats or recognized composite scores like KENPOM or ESPN’s BPI.

Ultimately, I really enjoyed reading through this. I’m a basketball junkie and it let me go through building an ML model in a subject with which I’m very comfortable. If you like basketball too, I highly recommend reading this.

If you’re curious, here’s the resulting bracket from his quick model:

Credit: Adit’s final bracket as predicted by the quick model he built

Special kudos to Adit for not somehow fudging the data in the Bruins favor. A Sweet 16 loss to Kentucky will hurt. Enjoy the madness!

--

--