Wine About It: Building Recommendations for Wine

The Background

If Pink Moscato had a motto it would be “Drink of the teenage girl”. As delicious as it is, as a 21 year old, I thought it was pretty lame that it was my go to wine. I decided to slowly branch out and try new wines, until I could order something a bit more sophisticated, like the seemingly disgusting Pinot Noir. I started with similarly sweet wines, such as Riesling, and then slowly worked my way into red wines, until I could finally tolerate dry red wines.

When I had to create a final project for my very educational class, Beverage Fermentation and Distillation, I thought this would be the perfect time to automate this process.

Using the wine.com api, I was able to build a website that can take any two wines and build an intermediary path between them, based upon the similarity of the wines.

How It Works

Data Cleaning

First I scraped the wine.com api, for over 10,000 different types of wine. The data stored on any given wine looks like this:

"Id": 160512,"Name": "Silver Oak Alexander Valley Cabernet Sauvignon 2012","Url": "http://www.wine.com/v6/Silver-Oak-Alexander-Valley-Cabernet-Sauvignon-2012/wine/160512/Detail.aspx","Appellation": {"Id": 2371,"Name": "Sonoma County","Url": "http://www.wine.com/v6/Sonoma-County/wine/list.aspx?N=7155+101+2371","Region": {"Id": 101,"Name": "California","Url": "http://www.wine.com/v6/California/wine/list.aspx?N=7155+101","Area": null}},"Labels": [{"Id": "160512m","Name": "thumbnail","Url": "http://cache.wine.com/labels/160512m.jpg"}],"Type": "Wine","Varietal": {"Id": 139,"Name": "Cabernet Sauvignon","Url": "http://www.wine.com/v6/Cabernet-Sauvignon/wine/list.aspx?N=7155+124+139","WineType": {"Id": 124,"Name": "Red Wines","Url": "http://www.wine.com/v6/Red-Wines/wine/list.aspx?N=7155+124"}},"Vineyard": {"Id": 999998522,"Name": "Silver Oak Alexander Valley","Url": "http://www.wine.com/v6/Silver-Oak-Alexander-Valley/learnabout.aspx?winery=19143","ImageUrl": "http://cache.wine.com/aboutwine/basics/images/winerypics/19143.jpg","GeoLocation": {"Latitude": -360,"Longitude": -360,"Url": "http://www.wine.com/v6/aboutwine/mapof.aspx?winery=19143"}},"Vintage": "2012","Community": {"Reviews": {"HighestScore": 0,"List": [],"Url": "http://www.wine.com/v6/Silver-Oak-Alexander-Valley-Cabernet-Sauvignon-2012/wine/160512/Detail.aspx?pageType=reviews"},"Url": "http://www.wine.com/v6/Silver-Oak-Alexander-Valley-Cabernet-Sauvignon-2012/wine/160512/Detail.aspx"},"Description": "","GeoLocation": {"Latitude": -360,"Longitude": -360,"Url": "http://www.wine.com/v6/aboutwine/mapof.aspx?winery=19143"},"PriceMax": 79.99,"PriceMin": 69.99,"PriceRetail": 74.99,"ProductAttributes": [{"Id": 613,"Name": "Big & Bold","Url": "http://www.wine.com/v6/Big-andamp-Bold/wine/list.aspx?N=7155+613","ImageUrl": ""},{"Id": 38,"Name": "Green Wines","Url": "http://www.wine.com/v6/Green-Wines/wine/list.aspx?N=7155+38","ImageUrl": "http://cache.wine.com/assets/glo_icon_organic_big.gif"},{"Id": 0,"Name": "Has Large Label","Url": "","ImageUrl": ""}],"Ratings": {"HighestScore": 90,"List": []},"Retail": null,"Vintages": {"List": []}},

There are a lot of features stored per wine, but I decided to ignore most of them, and focus on the features that contribute most to taste, and ignore factors such as rating and price. I rolled up all of these wines into their Wine Variety, such as Riesling. I ended up with 55 different varieties of wine. Each variety only contains the type of wine (such as Red, Rose or White), and Wine Attributes (such as sweet, fruity or dry). Using this data, I was able to build a prototype of each variety.

"Cabernet Sauvignon","{'Count': 1148, 'Attributes': {u'Big & Bold': 516, u'Smooth & Supple': 267, u'Light & Fruity': 1, u'Earthy & Spicy': 30}, 'WineType': {u'Red Wines': 1148}}"

From there, all I needed to do was create a notion of similarity between wines. The similarity score is a float between 0 and 1. The higher the number, the more similar the wines are.

Using percentages of individual wines within a wine rollup with a given attribute, I calculated the Euclidean distance between the attributes of wine. This number accounted for half of the similarity score. The other half was calculated using the Wine Type. If two wines were of the same type, they got a .5 boost on the similarity. If one of the wines was a Rose and the other was a White or Red, it got a .25 boost.

I calculated the similarity between all 55 wines, and took the top 7 most similar wines that also had a similarity score of at least 0.5.

I then had 349 connections between 55 different wines. I used these connections to build a weighted graph, where the weight of an edge was 1 minus the similarity score.

"Riesling,Sauvignon Blanc,0.664208616333"
,"Riesling,Chardonnay,0.656579635361"
,"Riesling,Muscat,0.639046721312"
,"Riesling,Semillon,0.627138184672"
,"Riesling,Viognier,0.595831298126"

To find a path between two wines, all one needs to do is run Dijkstra’s algorithm between the two nodes. The graph is not fully connected, but there does exist a path between most wines. The service works fairly well, although more specific data would have yielded more accurate results.

The end product

Check out the project here, and the repo here.