Man vs. Machine, 2018 Edition

Ryan Dew
6 min readMar 15, 2018

--

Last year, I debuted on Medium explaining how I had used data and algorithms to predict the NCAA tournament. If you’re curious, you can find that article here. This year, I did it again, but with a bigger model, more data, and, more interestingly, a human!

That’s right! This year, I’m testing the power of machines, versus the power of “cyborgs” — humans and machines working together to predict the results. And to help with that, I teamed up with my friend Zach Shen, who actually knows stuff about basketball. I asked him for help in interpreting what my model is learning, and tweaking it from previous years, correcting weird predictions it had made. The result is a brand new model, which I’ll describe below, and a hybrid bracket, which I’m calling a “cyborg” (part man, part machine), where Zach used the computer to make his own predictions.

But, before I get into explaining all of that, I’ll spoil the ending by revealing the brackets:

  • Bracket 1: pure machine learning, no human input, no data about the team’s seed in the tournament [Link]
  • Bracket 2: pure machine learning, no human input, but using information about how the team was seeded in the tournament [Link]
  • The “Cyborg” bracket: using data-driven decision making, fusing computer predictions with human judgment [Link]

The model

Last year, I described the process of building a bracket using machine learning. There are really two key steps: (1) building a model to predict the probability of Team A beating Team B in a tournament game, and (2) constructing a bracket based on that model.

Just like last year, my model this year is based on the statistics provided by Sports Reference about the team’s in-season performance. The computer uses data from that site, together with historical tournament outcomes, to learn what variables seem to matter in tournament play.

Unlike last year, this year I took advantage of what machine learners often call “ensemble learning,” which is the idea of combining many models to form a single prediction. In total, there are more than 50 models going into my predictions, each using different parts of the data and different algorithms to predict how likely it is that Team A beats Team B. These predictions are then averaged together to form a final win probability.

Recap: How to predict a bracket

In my last article, I spent a lot of time describing part two of this process: how to go from win probabilities to a bracket. If you want to read about it in more depth, I encourage you to check out that post.

The idea is the following: when predicting a bracket, the most important thing is picking the later rounds of the tournament correctly. This means, when generating your predictions, you want to be acutely aware of uncertainty at earlier stages: if there’s only a 51% chance that A beats B, that’s very different than a 91% chance that A beats B, and that difference should be accounted for in generating the final predictions.

To take all that into account, I use the probabilities described above to simulate the tournament 10,000 times, and then look at what team ended up the winner the most often. I lock that team in, then look at what team from the other half of the bracket ended up in the championship game the most often, and so on down the bracket until it’s fully filled. In this way, the predictions are centered around making predictions about the bracket, rather than the individual games, and around making predictions about the end that account for uncertainty toward the beginning. You might call it a “top-down” approach, rather than the “bottom-up” approach that most people use.

Note that a top-down approach, although optimally suited for getting points in March Madness, isn’t super intuitive: only in the first round are the predictions in my bracket the direct result of actually simulating the game itself. For example, my computer brackets both have Virginia beating Villanova to take the championship. However, this does not mean that the model thinks that Virginia would actually beat Villanova in a head-to-head. Rather, it means that, across the 10,000 simulations, Virginia was the team that most often ended up as the winner, and Villanova was the team that most often ended up winning the East/Midwest half of the bracket. (In fact, my model does think that Virginia would beat Villanova, but is very uncertain, with a probability of 52%, barely more than chance).

What did the computer learn?

First and foremost, just like last year, there is overwhelming uncertainty. The model only thinks there’s a 15% chance that Virginia actually wins. The runners up are Villanova at 12%, Kansas at 10%, Duke at 9%, and Purdue at 8%.

As a Penn alum, I’m happy to report that in a single simulation, Penn ended up winning everything (meaning a 0.01% chance of winning the tournament), though the picture isn’t pretty: there’s only a roughly 16% chance that Penn beats Kansas in the first round, a 4% chance it makes it to the Round of 16, a 1% chance for the Round of 8, a 0.4% chance for the Final Four, and a 0.04% chance to make it to the championship game.

Unlike last year, I also took some time to try to understand what the computer was actually learning, with the help of my friend Zach. Note that this is actually how data science should always be done: modeling and computation should always be followed by evaluation. Do my results make any sense? What am I learning?

In this case, although each of the submodels are using different parts of the data, and different algorithm, we do come away with some key findings: first, the team’s overall in-season performance, captured by Sports Reference’s “simple rating score,” is intuitively a big predictor of success. It basically captures how good the team is, and so it’s naturally going to be useful for predicting tournament wins.

But we also find that things like the team’s win/loss percentage when away, how many points opponents tend to score against the team per game, the team’s average points per game ratio, and the team’s offensive rating are important. Many of these also have intuitive interpretations. For example, as Zach pointed out, away win/loss percentage may be a better predictor of team skill and tournament performance than overall win/loss percentage, because you get rid of that comfort of playing at home. These other variables seem to be predicting aspects of performance, like how good their offense or defense is in a generic sense.

Cyborg predictions

Although I love talking stats and models, this year it’s the cyborg bracket that I’m most excited about. That’s because this bracket really mirrors what I think is almost always the best way to make decisions: in a data-driven fashion. All of the brackets I have made up until this point have been limited by the fact that I know nothing about basketball: I blindly trust what the algorithms tell me. This year, in generating the cyborg bracket, we were able to do things in a more reasonable way: when the computer was certain of an outcome (>70% chance of a certain outcome), Zach picked that. When it was more uncertain, Zach used his judgment, together with statistics about the teams, to make a pick.

One thing that’s interesting is that the model seems to have had a moderating influence on the predictions: in general, the model made “safe” predictions, tending to favor better seeded teams. It is not particularly suited for predicting upsets, which might be driven by situational factors that the data don’t capture. In reflecting on the cyborg bracket, Zach mentioned that it felt weird that he hadn’t picked any upsets in the first round. The most likely explanation for this is that the model was pretty certain about those outcomes.

In fact, this highlights what I think is a really important part of using data to make decisions across domains: sometimes humans want to see something surprising, want to trust a theory or an idea that just isn’t supported by the data. In many cases, computers are better at sorting through the noise to give the most reasonable prediction, and in quantifying how “reasonable” that prediction is. When combining data, models, and humans, we get the best of both worlds: that cold, machine stability, together with nuanced interpretation.

I’m excited to see how the cyborg fares as the tournament gets underway.

--

--

Ryan Dew

Ph.D. Student in Quantitative Marketing, interested in the intersection of business and data science, statistics, and machine learning