winequant
winequant
Feb 11, 2018 · 5 min read

You shall know a word by the company it keeps — J.R. Firth

To understand wine we need language. We cannot communicate taste without using words. Yet words are never quite enough. There is always a gap between what we sense and what we describe. The gap between taste and language is a source of mystery, wonder and sometimes frustration.

Established critics and formal tasting systems provide standards for language use. But how useful are these norms really? “Green apple, citrus peel, medium acidity.” How much information can you squeeze into a system of descriptors?

Our aim is to re-connect words with substance by thinking about word-maps rather than individual words or notes.

WQTaster is a tool that implements and adapts standard Natural Language Processing (NLP) techniques for wine tasting/writing. The essential point is that we understand wine words in context: what company does a word keep?

For example, are the words oyster and sea used in similar contexts to describe breezy whites? WQTaster finds that the following words are connected to breezy in wine descriptions. (The score indicates how close the relationship is.)

Breezy

  • ‘sea’, 0.87
  • ‘oyster’, 0.76
  • ‘resin’, 0.75
  • ‘shell’, 0.74
  • ‘rosemary’, 0.73
  • ‘iodine’, 0.69
  • ‘salt’, 0.68
  • ‘musk’, 0.65
  • ‘clove’, 0.64
  • ‘quinine’, 0.64

Models of Taste

We have intuitive notions about how wine descriptors relate. For example, WSET has a guide to wine-word usage based on a few broad categories called the “Systematic Approach to Tasting“. There’s the classic “Aroma Wheel” originally due to UC Davis Professor Ann C. Noble. A beautiful new attempt to categorize and visualize wine descriptors is due to WineFolly, see “Wine Descriptors & what they mean”.

If you like the aroma wheel, “blackcurrant” and “cassis” may frequently occur together in your notes to describe intense, fruity red wines.

In the WSET system, cassis isn’t a “standard word.” So if you’re following their template, blackcurrant and black cherry would be more likely pairs.

If WineFolly’s poster is on your wall, I expect a different style of note entirely: a Syrah might be “fleshy”, “flamboyant” and “plummy”.

Tasting models like these are important. A model acts like a lens that flattens something complicated into digestible conceptual chunks.

A map is not the territory it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness.

Alfred Korzybski, Science and Sanity (1933, p. 58)

From Models to Maps

Rather than starting with a model, we want to turn things around and start with what the critics are actually writing. WQTaster recovers associations and models from wine-note data.

There are many ways of building a tasting system from a collection of wine notes. The questions you have to answer are: how do I summarize the main features of the text? And what does “similar” mean? Is blackberry similar to raspberry if the two words consistently occur in the same sentence? Or are they similar if they consistently have the same neighbouring words?

Wine contains a vast amount of information, which tasters boil down to tasting notes. WQTaster turns this raw text into structured, high-dimensional vectors. It then focuses this high-dimensional summary of tasting notes down to three dimensions using an algorithm we will refer to as a lens.

We can re-run WQTaster with different lenses to highlight different parts of the landscape. Depending on the looking glass we use, we’ll get a different perspective. Some may be more useful or intuitive than others.

Juicy Berries

Let’s start by looking at how different kinds of berries feature in our wine note collection as seen through the lens called PCA (Principal Component Analysis).

The distance between words is represented by relative positions on two axes and a colour scale. The size of the circles indicates how common a word is.

Common berry descriptors clustered using classical neural net algorithms and viewed through 3-D PCA. The size indicates the frequency with which the descriptor occurs in the notes. X-Axis: PCA Component 1, Y-Axis: PCA Component 2, Colour (RdPu): PCA Component 3 , Size: represents frequency of the word in the corpus.

The horizontal axis on the graph is the most significant. The further apart the words are, the more dissimilar. In particular, tomato (yes, a tomato is technically a berry) and gooseberry are out on their own. You’d expect this, right?

Note that even though tomato and gooseberry are close together in the picture, their colours are quite different. In a true 3-D plot, they’d be far apart on the third axis.

“Berry” and “cherry” stand out. These are frequently used, generic words. In particular, cherry is usually divided up into “black” or “red”. So effectively we are combining two quite distinctive descriptors into one here, which explains why it stands out so vibrantly.

Cranberry, strawberry and redcurrant are all red and fresh and somewhat young Pinot-like, but why is raspberry not in this cluster as I would have expected? According to our lens, raspberry is closer to blackcurrant and cassis than to strawberry.

This is why a map is so useful: you can scan it at a glance and start asking questions. Indeed, feel free to use this picture to inform your use of “berry” in tasting notes.

Suppose you’re writing a note and you’ve used the words “blueberry” and “blackcurrant,” which came to mind first. If you’re aiming for precision, you might like to test whether “cherry” or “mulberry” describe the taste you’re getting. Using one or the other would place you on the map more firmly.

If you want to open it up more and widen the circle then think about “cranberry” as a possible descriptor.

Once you’ve covered enough of the “berry” space you might like to move on to a map of secondary descriptors, vanilla perhaps…

References

You may enjoy Making Peace in the Language Wars, and Tense Present: Democracy, English, and the Wars over Usage. That’s really what this is all about.

…writing is a learned activity, no different in that regard from hitting a golf ball or playing the piano. Yes, some people naturally do it better than others. But apart from a few atypical autodidacts (who exist in all disciplines), there’s no practical way to learn to write, hit a golf ball, or play the piano without guidance on many points, large and small. And everyone, even the autodidact, requires considerable effort and practice in learning the norms. The norms are important even to those who ultimately break them to good effect. Bryan A. Garner, Garner’s Modern American Usage (2009, p. 104)

winequant

Written by

winequant

for enthusiasts of wine & taste: data, tech and ideas you may not find elsewhere. see through the present more clearly, peer into the glass more imaginatively.

More From Medium

Also tagged Data Visualization

Also tagged Data Visualization

Toss Some Data to Your Witcher

Also tagged Word Embeddings

Also tagged Mapping

Also tagged Mapping

Habitat & Biodiversity

54

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade