Fetching Better Beer Recommendations with Collie (Part 3)

Nate Jones
8 min readMay 4, 2021


Using metadata to train better models and (finally) drinking some beers!

Part 1 | Part 2 | Part 3 (you are here)

An image of the Collie dog looking at glasses of beers with a question mark thought.

TL;DR — I talk even more about ShopRunner’s latest open source library, Collie [GitHub, PyPI, Docs], for training and evaluating deep learning recommendations systems. We use metadata to train better models and evaluate them. I then, finally, get to drink beer.

A Small Recap

By this point in the blog series, we’ve seemingly done it all — prepared data, trained a model, trained a better model using adaptive losses, and then trained a better better model using a double optimizer trick. But, one must ask, is this the best we can do?

Incorporating Metadata into the Model

If you went up to a robot bartender powered by the models we built above and asked for a drink recommendation, they would look at the thousands of user preferences in the data, find items that share features with the drinks that you like, and systematically recommend the highest ranking beers to you. But, if you went up to a human bartender and asked for a drink recommendation, they would just note shared features in the beers you like (things like beer style, brewer, etc.), and recommend new beers that also share those features. As I write this in 2021, bartenders are humans for a reason — this way of recommending beers is surprisingly effective (update: I just looked it up and, of course, there are already robot bartenders. Uh oh).

An image of a robot bartender making a drink. The robot itself is mainly just a mechanical arm. It shakes, then pours the drink into a glass.
The future of automation is here already.

While a human bartender can’t possibly do the level of computation a robot bartender does to recommend beers, we can have our robot bartender use the information that the human uses. In a recommendations setting, we refer to this as item metadata. In this beer dataset specifically, we not only know the beer ID, but also the style of beer (Stout, Imperial IPA, German Pilsener, etc.). Ideally, you would expect that if a user has had Pale Ales in the past, they might like to drink Pale Ales in the future.

In Collie, we can incorporate this side-data directly into the model and/or the loss function (see the section below this). When we use side-data in the model, we create something known as a hybrid model.

For our beer dataset, we’ll encode the beer style data into a one-hot-encoded vector (a vector of 0s when a beer style does not apply to this beer and a 1 when it does) and feed that into a model. Ideally, this works better when we have rich feature vectors as side-data (things like image or text embeddings for items work great here), but for this simple example, we’ll just stick with the one-hot-encoded beer style as our metadata.

Since we’ve already trained a matrix factorization model in a previous blog post, we can use those trained embeddings in this model to save computational time and take advantage of some neat transfer learning tricks.

The MAP@10 table with an added row for the hybrid matrix factorization model, with MAP@10 score 0.02182.

Our MAP@10 score is higher than a model that does not use metadata (nice!), but not high enough that we see a huge benefit to including this data. To me, this indicates that simple, one-hot encoded metadata is not best incorporated directly into the model (if we had rich, feature embeddings, we could use that in the model and likely have even better results), but instead into the loss function itself.

Incorporating Metadata into the Loss Function

In Collie, we can also incorporate metadata directly into the loss function. We can use this data by penalizing the model less if it ranks a negative item higher than a positive item given they both share the same metadata (i.e. “you recommended the wrong beer, but it is still a Pale Ale, and this user likes Pale Ales, so we’ll give you partial credit for this prediction”). Every loss supports the addition of metadata for these partial credit calculations!

A GIF from the Simpsons. Students with desks piled on top of one another watch a TV, where a prerecorded lesson plays. A man stands in front of a chalkboard that says, “PEPSI Presents: Addition and Subtraction.” The man says to the class, “Now turn to the next problem. If you have three Pepsis and drink one, how much more refreshed are you? You, the readhead in the Chicago school system?” The camera pans over to the readheaded young girl, who replies, “Pepsi?” The man replies, “Partial credit.”
Our new loss function, essentially.

All we’ll need here is a tensor containing the beer style IDs we want to match on (which we create in the code snippet below) and how much we want to weight this metadata on the loss. In this case, I will weigh the beer style such that it is an 85% match if it’s a different item but the same beer style. You can read more about how to use metadata in loss functions here.

And look, our highest MAP@10 score yet, proving how important metadata is towards making effective recommendations!

The MAP@10 table with the final row added for the partial credit matrix factorization model, with MAP@10 score 0.02202. This row is bolded since this is the best performing model of the bunch.

With Collie, we can combine both metadata methods we tried above together into a single model — with metadata directly incorporated into the model and the loss function, just with a few added input arguments to the code snippets above this. Neat!

Seeing If Our Model Actually Works

While our evaluation metric numbers are certainly useful, seeing is really believing when it comes to recommendations. And since I selfishly did this project to figure out beer recommendations for myself, let’s see the results for me!

Item-Item Recommendations

All Collie models support retrieving item-item recommendations, which in this case means we can supply a seed beer ID and find similar beers to that one. Under the hood, this is taking the item embeddings from a trained model and performing row-wise cosine similarity to find similar item embeddings to our seed beer ID’s.

For me, I love a nice mug of Stella Artois, and it’s my #1 beer I order at bars, restaurants, and my own home. Of course, I’d love to branch out a bit, so let’s see what our Collie model thinks is similar to this. I’ll be using our best performing model, the metadata loss matrix factorization one, to make these recommendations.

In order, the top 5 most similar beers to Stella Artois are:

  1. Old Speckled Hen
  2. Samuel Smith’s Nut Brown Ale
  3. Boddingtons Pub Ale
  4. Heineken Lager Beer
  5. Samuel Smith’s Old Brewery Pale Ale

Most beers on this list are English Pale Ales, which I tend to love, so already, I’m thinking this list seems to pass the gut check.

As a quality check too, I went ahead and Googled if Stella Artois and Heineken (the only recognizable beer on that list to me) were similar. Reddit says yes, so I will count this as a win!

A screenshot of a Reddit post. The poster asks for similar beers to Stella Artois. Two replies are shown, the first says that “… Heineken are some that would fall right in line with Stella.” The second reply says that, “A massed produced beer like Heineken is one that would fit with Stella.”

User-Item Recommendations

While item-item recommendations get us some recommendations, what if we have a new user not in our training dataset with many different beers they enjoy, not just a single one? For this, we can actually fine-tune a model to predict personalized recommendations for this new user.

While this use-case is not explicitly built into Collie, we can manipulate the deep learning model pieces such that we can achieve this! In the code below, I’ll input some beers I know I love and create a small Interactions dataset with this. In addition to Stella Artois, I have confirmed that I love Corona Light (with a lime) and Pabst Blue Ribbon (which is surely a legal requirement for living in Chicago).

With this small dataset containing only a single user (me), we can create a new model as normal, but before training, copy over the existing model’s item embeddings and biases, and freeze them such that we can’t change those at all as we fine-tune the model. Then, I’ll train a model to optimize only my user embedding such that we can get personalized recommendations for my own beer taste.

With this model, we can pretty easily get some recommended beers for me:

With this, the top 5 recommended beers based on my favorite beers are:

  1. Old Rasputin Russian Imperial Stout
  2. Dead Guy Ale
  3. Sierra Nevada Pale Ale
  4. Young’s Double Chocolate Stout
  5. HopDevil Ale

I haven’t heard of any of these beers, but I will say that they all have pretty cool names!

Mass Appeal Recommendations

Lastly, since we can just get this information for free out of our model, let’s also get the beers that the model thinks have the highest appeal to the widest audience. This doesn’t necessarily always mean the beers are the most popular, but rather ones that people will generally try, regardless of what style of beer they prefer (i.e. most people, regardless of whether they love light or dark beers, and regardless of whether they enjoy canned or bottled beers, tend to have had beer X).

We can do this in Collie by selecting the beers with the highest bias terms, as done below:

Here, we can see our model thinks the following 5 beers having the highest appeal:

  1. Guinness Draught
  2. Sierra Nevada Celebration Ale
  3. Stone Ruination IPA
  4. Samuel Adams Boston Lager
  5. Two Hearted Ale

I think this isn’t too surprising here, considering that most people, regardless of whether they drink Irish Dry Stouts normally, have tried a Guinness before.


Evaluation metrics and light Googling can only go so far — I soon realized that if I were really going to prove how well the Collie model performed, I would have to actually try the beers recommended to me. So, I did just that.

I went to my local liquor store, Binny’s, and tried to find at least one beer from each of the three recommendation categories I listed above. To my surprise, I was actually able to find the top beer recommended to me in each of the categories above (shoutout to the Binny’s employee who helped me find everything in under a minute — a true beer expert right there!).

A picture I took myself of the beers I bought. From left to right, these include Old Speckled Hen, Old Rasputin Russian Imperial Stout, and Guinness Draught.

And… I am impressed! Seriously!

The Old Speckled Hen beer, which my model said was similar to a Stella Artois, held up! It was a light beer that was easy to drink and was unsurprisingly very similar to a Stella. The Old Rasputin Russian Imperial Stout, despite being a darker beer, was unbelievably smooth and easy to drink. In fact, the Old Rasputin was actually the best of the bunch (thanks Collie!).

A GIF of Action Bronson sipping a drink that looks like beer from a wine glass.
To be as happy as Action Bronson looks drinking his drink here.

As for the Guinness Draught… nope. Not for me. Please no. I get it, but never again. Sorry to Guinness fans.

A GIF of a young girl trying her mom’s spaghetti for the first time. She tries to enjoy it while eating, but ends up involuntarily gagging twice while trying to keep it down. She finally swallows the food, then laughing to her mom, “I’m okay!”
My partner, Jen, says Guinness tastes like “motor oil.” Ouch.

I hope this blog was informative, but it is truly just scratching the surface of what Collie is able to do. At ShopRunner, we are actively developing Collie to continue to push the limits of what deep learning recommendation algorithms are able to do at scale.

All code used in this blog post can be found in the GitHub Gist here. If you’d like to learn more or even contribute to the Collie repo, you can find the code on GitHub here.

Lastly, a huge thank you to my manager, Nicole Carlson, for reviewing all these blog posts and helping make them more readable and coherent. Nicole, you rock!!