If you’re as excited to kick off March Madness 2019 as we are, then you know it’s not just about watching the games, it’s also about what we can learn from them. And while we’re having fun poking around various types of data analysis and predictions for college basketball, you might be wondering how you might be able to play along at home — especially if you don’t know your SQL from your ABC.
Good news: our teammates at Qwiklabs have built the NCAA® March Madness®: Bracketology with Google Cloud Quest, and it culminates in using machine learning to predict the outcome of a tournament game.
If you’ve never done a Qwiklab Quest before, it’s a series of interactive tutorials to help you learn a core skill on Google Cloud Platform. A Quest gives you hands-on experience with the cloud, and upon completion, awards you a badge that highlights the skills you’ve mastered on GCP.
The Quest starts from scratch, and assumes no prior knowledge of Google Cloud Platform or any of its tools. Qwiklabs relies on temporary credentials, which means no GCP account or sign-up (or credit card!) is required. Each lab in the Quest builds on the skills of the one before it, and this collection of labs is designed to help guide you from the foundations of BigQuery, Google’s fully-managed data analytics warehouse, through its machine learning features. None of them should take more than an hour. (Already game? See offer code below to get started.)
You’ll start with a lab to help you familiarize yourself with BigQuery in the Cloud Console. From there, the next lab offers an introduction to SQL and the types of questions you might ask of a dataset (you’ll be looking at London bikeshare data for this one). Once you’ve gotten the hang of BigQuery and its SQL editor, you’ll move on to some guided exploratory data analysis of the college basketball public dataset. You’ll be able to find which five games features the most three-point shots made and their accuracy; which five basketball venues have the highest seating capacity; the highest scoring games since 2010; and more.
Finally, you’ll progress to bracket predictions using BQML, BigQuery’s machine learning tool that lets you begin to create models with as little as four lines of code. This Qwiklab demonstrates the power of machine learning by showing you how to create a simple naive model that predicts game outcomes using team seeding alone, and then comparing that against a more sophisticated model that draws in features that take into account the strength of schedule of different teams.
You’ll start by using a simple feature set (season, team name and seed, opponent name and seed), in our historical data and creating a win/loss label for every team. Once you’ve set up your labeled machine learning dataset, you’ll train a basic logistic regression model against data from 1985–2017, which will generate a probability for each label value (‘win’ or ‘loss’), and then test it against 2018 tournament data. In addition to predicting outcomes (and seeing their results), you’ll also be able to see how the model weighted the value of each feature.
After seeing how this naive model performs, you’ll build a second model using more advanced (“skillful,” in ML-lingo) features, such as scoring efficiency and possession time of the basketball. If you’re curious as to whether a better model could have predicted some of the wildness of last year’s tournament, you’ll want to give it a try.
Still, a model is only as good as the data fed into it. If you aren’t satisfied with outcome of ours, hop on over to Kaggle to get some feature engineering inspiration from the sixth annual March Madness machine learning competition. Now that you’ve got a handle on the basics of BQML, take a look at some other ideas to see how else you might want to train your model, and how you might deliver an optimized matrix of match-up probabilities for the men’s and women’s tournaments.
If you’re curious as to how we built our features, check out the Colab notebook that documents the thinking behind them, or our post explaining what we found. And if this Quest has whetted your appetite for machine learning even more than it has for college basketball, come join us at NEXT in San Francisco this April, where you’ll find more bootcamps on machine learning, feature engineering (and basketball, of course).
Enroll here and use code 1c-marchmadness-936 for 1 month no-cost access to Qwiklabs. Think you can finish the quest in 30 days or less? Offer expires April 15.