With the Super Bowl this weekend, there is so much to look forward to: the excitement of watching Tom Brady make his classic comeback, the inevitably epic commercials, and last but not least, a feast that rivals most holidays other than Thanksgiving.
Earlier this football season, I used Fantasy Insights with Watson, a new feature of ESPN’s fantasy app, to win my league’s regular season and take 3rd place overall. The system relies on customized natural language processing (NLP) and machine learning techniques to process online content about football players (analyst reports, blogs, news, etc.), and predict chances they will boom or bust in their next game.
With the fantasy season long over, I started thinking about how AI could be applied to the Super Bowl. I found a dataset from Delish.com of the most Google’d Super Bowl recipe in each state. Buffalo chicken dip took the top spot in eight states, but some states had less conventional plans this weekend; Nevada’s top hit was vegan cheesy bacon spinach dip and Massachusetts is apparently obsessed with gluten-free pretzels. I wondered, did states in the same region of the United States tend to like the same Super Bowl foods? And given a person’s favorite Super Bowl food, is it possible to predict what region they are from?
I decided to use IBM Watson Natural Language Classifier, which lets anyone use machine learning to organize text into custom “classes,” or categories. To train the machine learning model, users upload a training dataset, which is just a .csv file with two columns; each row contains 1) the word or phrase, ex. in this case “gluten-free pretzels” and 2) the class, in this case “New England.”
To test the model, I ran a poll among friends, asking their favorite Super Bowl food, and home state. I entered the food into the “Test” phase of Natural Language Classifier, where the user feeds new, previously unseen text to the classifier, and is given scores of the likelihood that text falls into each class. So in the “loaded nachos” test below, my classifier gives a 35% chance that person is from the Far West, and 28% chance they are from the Plains.
My testing with friends did not go very well; only 6 out of 27 were categorized in the correct region of the country. This leads me to my three takeaways from this exercise:
1. It can be extremely easy and fun to apply AI, especially NLP techniques, to almost anything. There are datasets online about almost any topic you would want, and a Natural Language Classifier model can be set up in just 15 minutes. Try it yourself!
2. Any AI model is only as good as the training data. With only 51 data points, and 8 possible classes, each class had only a few training data points, and some foods only showed up once and were essentially hard-coded into a region. This was simply not enough data. For example, Alaska was the only state with “nachos” in the training dataset, so all of my friends who preferred nachos were categorized as “Far West.” I can imagine a stronger model if it was trained on hundreds, or even thousands of data points per region.
3. Keep in mind that datasets usually map to real people or companies and their real lives and decisions, and understand that even the best machine learning model can’t fully reflect (or predict!) reality. This is a simple application of AI, but I tried to keep in mind that no matter what data point, prediction, or certainty level I was looking at, these regions, states, Super Bowl dishes, and friends are all far more nuanced than this model portrays. In any use case, ethical considerations in AI are essential.
Enjoy the Super Bowl, and check out IBM Watson Natural Language Classifier on Monday!