Fetching Better Beer Recommendations with Collie (Part 3)
Using metadata to train better models and (finally) drinking some beers!
Part 1 | Part 2 | Part 3 (you are here)
TL;DR — I talk even more about ShopRunner’s latest open source library, Collie [GitHub, PyPI, Docs], for training and evaluating deep learning recommendations systems. We use metadata to train better models and evaluate them. I then, finally, get to drink beer.
A Small Recap
By this point in the blog series, we’ve seemingly done it all — prepared data, trained a model, trained a better model using adaptive losses, and then trained a better better model using a double optimizer trick. But, one must ask, is this the best we can do?
Incorporating Metadata into the Model
If you went up to a robot bartender powered by the models we built above and asked for a drink recommendation, they would look at the thousands of user preferences in the data, find items that share features with the drinks that you like, and systematically recommend the highest ranking beers to you. But, if you went up to a human bartender and asked for a drink recommendation, they would just note shared features in the beers you like (things like beer style, brewer, etc.), and recommend new beers that also share those features. As I write this in 2021, bartenders are humans for a reason — this way of recommending beers is surprisingly effective (update: I just looked it up and, of course, there are already robot bartenders. Uh oh).
While a human bartender can’t possibly do the level of computation a robot bartender does to recommend beers, we can have our robot bartender use the information that the human uses. In a recommendations setting, we refer to this as item metadata. In this beer dataset specifically, we not only know the beer ID, but also the style of beer (Stout, Imperial IPA, German Pilsener, etc.). Ideally, you would expect that if a user has had Pale Ales in the past, they might like to drink Pale Ales in the future.
In Collie, we can incorporate this side-data directly into the model and/or the loss function (see the section below this). When we use side-data in the model, we create something known as a hybrid model.
For our beer dataset, we’ll encode the beer style data into a one-hot-encoded vector (a vector of 0s when a beer style does not apply to this beer and a 1 when it does) and feed that into a model. Ideally, this works better when we have rich feature vectors as side-data (things like image or text embeddings for items work great here), but for this simple example, we’ll just stick with the one-hot-encoded beer style as our metadata.
Since we’ve already trained a matrix factorization model in a previous blog post, we can use those trained embeddings in this model to save computational time and take advantage of some neat transfer learning tricks.
Our MAP@10 score is higher than a model that does not use metadata (nice!), but not high enough that we see a huge benefit to including this data. To me, this indicates that simple, one-hot encoded metadata is not best incorporated directly into the model (if we had rich, feature embeddings, we could use that in the model and likely have even better results), but instead into the loss function itself.
Incorporating Metadata into the Loss Function
In Collie, we can also incorporate metadata directly into the loss function. We can use this data by penalizing the model less if it ranks a negative item higher than a positive item given they both share the same metadata (i.e. “you recommended the wrong beer, but it is still a Pale Ale, and this user likes Pale Ales, so we’ll give you partial credit for this prediction”). Every loss supports the addition of metadata for these partial credit calculations!
All we’ll need here is a tensor containing the beer style IDs we want to match on (which we create in the code snippet below) and how much we want to weight this metadata on the loss. In this case, I will weigh the beer style such that it is an 85% match if it’s a different item but the same beer style. You can read more about how to use metadata in loss functions here.
And look, our highest MAP@10 score yet, proving how important metadata is towards making effective recommendations!
With Collie, we can combine both metadata methods we tried above together into a single model — with metadata directly incorporated into the model and the loss function, just with a few added input arguments to the code snippets above this. Neat!
Seeing If Our Model Actually Works
While our evaluation metric numbers are certainly useful, seeing is really believing when it comes to recommendations. And since I selfishly did this project to figure out beer recommendations for myself, let’s see the results for me!
All Collie models support retrieving item-item recommendations, which in this case means we can supply a seed beer ID and find similar beers to that one. Under the hood, this is taking the item embeddings from a trained model and performing row-wise cosine similarity to find similar item embeddings to our seed beer ID’s.
For me, I love a nice mug of Stella Artois, and it’s my #1 beer I order at bars, restaurants, and my own home. Of course, I’d love to branch out a bit, so let’s see what our Collie model thinks is similar to this. I’ll be using our best performing model, the metadata loss matrix factorization one, to make these recommendations.
In order, the top 5 most similar beers to Stella Artois are:
- Old Speckled Hen
- Samuel Smith’s Nut Brown Ale
- Boddingtons Pub Ale
- Heineken Lager Beer
- Samuel Smith’s Old Brewery Pale Ale
Most beers on this list are English Pale Ales, which I tend to love, so already, I’m thinking this list seems to pass the gut check.
As a quality check too, I went ahead and Googled if Stella Artois and Heineken (the only recognizable beer on that list to me) were similar. Reddit says yes, so I will count this as a win!
While item-item recommendations get us some recommendations, what if we have a new user not in our training dataset with many different beers they enjoy, not just a single one? For this, we can actually fine-tune a model to predict personalized recommendations for this new user.
While this use-case is not explicitly built into Collie, we can manipulate the deep learning model pieces such that we can achieve this! In the code below, I’ll input some beers I know I love and create a small Interactions dataset with this. In addition to Stella Artois, I have confirmed that I love Corona Light (with a lime) and Pabst Blue Ribbon (which is surely a legal requirement for living in Chicago).
With this small dataset containing only a single user (me), we can create a new model as normal, but before training, copy over the existing model’s item embeddings and biases, and freeze them such that we can’t change those at all as we fine-tune the model. Then, I’ll train a model to optimize only my user embedding such that we can get personalized recommendations for my own beer taste.
With this model, we can pretty easily get some recommended beers for me:
With this, the top 5 recommended beers based on my favorite beers are:
- Old Rasputin Russian Imperial Stout
- Dead Guy Ale
- Sierra Nevada Pale Ale
- Young’s Double Chocolate Stout
- HopDevil Ale
I haven’t heard of any of these beers, but I will say that they all have pretty cool names!
Mass Appeal Recommendations
Lastly, since we can just get this information for free out of our model, let’s also get the beers that the model thinks have the highest appeal to the widest audience. This doesn’t necessarily always mean the beers are the most popular, but rather ones that people will generally try, regardless of what style of beer they prefer (i.e. most people, regardless of whether they love light or dark beers, and regardless of whether they enjoy canned or bottled beers, tend to have had beer X).
We can do this in Collie by selecting the beers with the highest bias terms, as done below:
Here, we can see our model thinks the following 5 beers having the highest appeal:
- Guinness Draught
- Sierra Nevada Celebration Ale
- Stone Ruination IPA
- Samuel Adams Boston Lager
- Two Hearted Ale
I think this isn’t too surprising here, considering that most people, regardless of whether they drink Irish Dry Stouts normally, have tried a Guinness before.
Evaluation metrics and light Googling can only go so far — I soon realized that if I were really going to prove how well the Collie model performed, I would have to actually try the beers recommended to me. So, I did just that.
I went to my local liquor store, Binny’s, and tried to find at least one beer from each of the three recommendation categories I listed above. To my surprise, I was actually able to find the top beer recommended to me in each of the categories above (shoutout to the Binny’s employee who helped me find everything in under a minute — a true beer expert right there!).
And… I am impressed! Seriously!
The Old Speckled Hen beer, which my model said was similar to a Stella Artois, held up! It was a light beer that was easy to drink and was unsurprisingly very similar to a Stella. The Old Rasputin Russian Imperial Stout, despite being a darker beer, was unbelievably smooth and easy to drink. In fact, the Old Rasputin was actually the best of the bunch (thanks Collie!).
As for the Guinness Draught… nope. Not for me. Please no. I get it, but never again. Sorry to Guinness fans.
I hope this blog was informative, but it is truly just scratching the surface of what Collie is able to do. At ShopRunner, we are actively developing Collie to continue to push the limits of what deep learning recommendation algorithms are able to do at scale.
All code used in this blog post can be found in the GitHub Gist here. If you’d like to learn more or even contribute to the Collie repo, you can find the code on GitHub here.
Lastly, a huge thank you to my manager, Nicole Carlson, for reviewing all these blog posts and helping make them more readable and coherent. Nicole, you rock!!