A text analysis on wine flavors of different origins

Hannah Yan Han
2 min readJul 22, 2017

--

What are the differences in flavors between wines from Oregon vs California, Bordeaux vs Burgundy, Northern Spain vs Argentina? Using 150K reviews from WineEnthusiast, I extracted key words describing flavors of the wines.

After initial inspection, I noticed adjectives not only can describe the wine, but also other attributes:

  • Lemma like ‘dry’ can refer to flavor of the wine (dry aroma), ingredients (dried cranberry), or irrigation method (dry valley). To remove the other two, I filtered out where token=’dried’ and where the word after it is ‘valley’.
  • Words describing colors are typically referring to the ingredients rather than the wine itself, such as ‘black’ in ‘black currant’, and ‘yellow’ in ‘yellow peaches’. I’ve filtered out those.

After filter out words indicating colors, I plotted the most prominent flavors used to describe the particular wine.

We can observe:

  • Wines from Washington and Mendonza are noted for their sweetness
  • Wines from Burgundy and Bordeaux are noted for their rich flavors
  • ‘Dry’ is still a key attribute of California wines
  • Northern Spain and Mendoza, Argentina wines have ‘herbal’ flavors, while Spanish wines are more ‘earthy’
  • People describe Oregon wines as ‘tart’, ‘fresh’ and sometimes ‘spicy’

Possible next steps:

  • Some adjectives are strongly associated with particular wine types, like ‘crisp’ with white wines. We can potentially use words to classify wines
  • Using similarity of the words, we can cluster which wine origins are similar. Arguably those in geographical proximity and share similar culture heritage would producer similar flavors but we have yet to see some numerical evidence.

This is #day42 of my #100dayprojects on data science and visual storytelling. Full code on my github. Thanks for reading. Suggestions of new topics and feedbacks are always welcomed.

--

--