Vintage, AVA and Quality: A Study of Napa Valley Wines

Rachel Woods
The Wine Nerd
Published in
5 min readJul 28, 2019

Let’s talk about terroir — arguably the most influential piece of the winemaking process that differentiates bad, good, and great wines. Terroir commonly refers to the climate and place where a wine is made, and the belief is that terroir heavily influences the characteristics of a wine. To add a more thorough definition:

Although it’s not a common extension of the term terroir, vintage is also an important component of wine quality. This is particularly true in regions like Northern California (including Napa) that have dramatically different climates year to year. Vintage variation is a big reason that wine critics and magazines (think Robert Parker, Wine Enthusiast, etc.) put together “vintage guides” where they discuss how each vintage fared and its influence on both the quality and ageability of the wines. Below is an example of the Wine Advocate vintage guide:

So how much does vintage really impact scores?

Based on the vintage charts above, we see that 2011 was deemed an “Average” vintage for North Coast cabernet sauvignon, and “Above Average” vintage for North Coast chardonnay. We can take our own look at this by analyzing 3,000+ expert reviews on Napa cabernet sauvignon and chardonnay. We collected data points of wines from 2004 to 2016 (the most recent released vintage of cabernet) including the vintage, the critic score given, the price, and the AVA.

First, we were curious to understand on average, how much does vintage affect score. To do this, we compare the distribution of scores given to cabernet versus chardonnay by year. The black lines represent the variation in scores, while the bars themselves represent the average. We see from below that cabernet from napa generally scores better, however it also appears more volatile with respect to bad vintages. Further, the top vintage for cabernet (2013) is different than the top vintage for chardonnay (2015) as measured by average score.

To take a different direction, how much does AVA (another component of terroir) impact score and prices?

AVAs (American Viticultural Areas) are specific subregions within Napa itself that are thought to have distinct terroir and thus wines. With an AVA comes some expectation about quality, and as a result, price.

From the comparison below, we can first see some AVAs have the highest average scores (green lines): Oakville, St. Helena, Stag’s Leap District and Diamond Mountain District. But adding price as a comparison (red lines), and then lining up the average score with the average price for the area (grey dotted line), we can start to understand AVAs that are over and underpriced based on scores. For example, prices for Coombsville, St. Helena, Howell Mountain, and Calistoga appear to be somewhat in line with scores, and the scores are still quite high.

Predicting price based on AVA, Score and Vintage

So now we move to the last and possibly most interesting piece of this analysis: can we predict price based on some of these basic attributes: AVA, Grape, Score and Vintage? Generally, we know that quality should be the major predictor of price, and score is supposed to be a good proxy for quality (although there are some issues with it). We also know that vintage affects quality, from above, but those variations should be captured in score. Instead, with vintage we may be able to capture more general changes in the price of the wine over time. Lastly, AVA and grapes are definitely an important factor affecting price — so lets include those.

The general steps we took to make this model included:

  • Testing multiple types of regression models: Linear, SVM, Decision Trees, Ensemble Methods. Finally ended up with Random Forest Regressor as the best performing baseline model with an R Squared = 0.39
  • Then we used Random Search to tune our model parameters, and ended up with a model with an R Squared = 0.51. Some of the major improvements of hyper-parameters included limiting the depth and number of samples per leaf.
  • Our final feature importance shows that score is the most important feature, followed by vintage and grape type.

So what did we learn from this exploration?

Vintage, AVA, Score and Price are heavily related when it comes to understanding wines. There is a different relationship between some factors (like vintage and score) for certain types of grapes. We also can predict prices fairly well based on these simple attributes.

Future ideas and next explorations:

  • Model interpretability for outlier predictions — why did our model predict such a different price than reality for some of these wines?
  • Are there improvements we can make to this model by either removing outliers, or encoding some additional context about outlier wines? For example, there are some “cult classic” wines in this dataset that are $450+ and it’s probably better than our model doesn’t consider those as rational “predictable” prices.

--

--