A first attempt at data driven wine tasting: Lodi

Rachel Woods
The Wine Nerd
Published in
5 min readApr 17, 2019

I have decided to spend the next few months scoping the technical viability of some ideas to solve pain points in the wine industry. One of them is the selection of vineyards to visit when you are going to a new wine region.

So what did I do? The goal of this analysis project was to have a quick proof-of-concept of how we could collect data about wineries and return a ranking of best wineries to visit.

Methodology

  • Scraped wineries in the Lodi region from https://californiawineryadvisor.com/winery-near-me/ and ended up with 33 wineries. Cleaned up dataset and removed those wineries that were closed.
  • Feature engineering tasks such as: distance from downtown (accessibility measure), varietal diversity (how evenly is their red/white wine ratio split), desirable traits score (how many attributes were listed on their winery page that were “desirable” such as picnic area), cleaned price range and fees data, and collected Wine Enthusiast wine ratings.

Decision Criteria — ranked, and then weighted using the rank sum weight method to translate ranks into weights of decreasing importance.

  1. Distance downtown (negative)
  2. Wine ratings recent average (positive)
  3. Wine ratings recent count (positive)
  4. Max price in price range (negative)
  5. Varietal diversity: absolute value of (Red / Total Count — 0.5) (negative)
  6. Desirable traits score (positive)
  7. Established (negative)
  8. Fees (negative)
  9. Wine ratings all average (positive)
  10. Wine ratings all count (positive)
Demonstration of weights applied, using rank sum method

Analysis Results

After normalizing each criteria into a 0 to 1 scale, we have a distribution of those criteria per winery represented below. We then can also look at where specific wineries fell on each of the criteria.

Distribution of all criteria
Highlighting data points for each of top ranked wineries, amongst criteria

The final ranking was determined by calculating a weighted average of all the criteria, determined using the weighting methodology above. And the final ranking was:

  • Michael David Winery — 69 points
  • Riaza Wines — 68 points
  • Klinker Brick — 63 points
  • Jessie’s Grove — 60 points
  • LangeTwins — 59 points

Testing out the data in real life

Following this analysis (or rather, the reason this analysis was done) was a visit to Lodi to see how the data-backed rankings matched up with reality. My reviews of the top 5 wineries were as such:

Michael David Winery (2/5): Michael David demonstrated how the current methodology and ranking system is biased towards large production. Michael David is a “Napa-scale” tasting room, honestly probably built for tour bus groups and so the service we got was not great. The wine was alright, but I feel like they didn’t have their best wines in their tasting lineup — likely because people who come are mostly there for the drinking.

Riaza Wines (4/5): Riaza was a phenomenal experience and find — something I think we wouldn’t have found or gone to without the ranking methodology. Their tasting room was in a warehouse-type building close to downtown, their wines were phenomenal, and their winemaker was great to connect with. Riaza had the type of experience that is hard to quantify: authenticity. The winemaker/owner has a personal philosophy behind each decision about his business, and is readily available and happy to share that with you.

Klinker Brick (3/5): Klinker Brick was a medium scale production winery, but kept their tasting experience casual and personal. Again, hard to quantify which tasting experiences are good and bad (maybe Yelp/Google reviews can help us get there in the next version). Their wines were quite good and their property was gorgeous with a large picnic area out back for people to enjoy wine and any food they brought.

Jessie’s Grove (5/5): Jessie’s Grove was the highlight winery of the weekend. The entire experience was intimate, driven by passion, and purposeful. Each piece of the property has a history and a story, typically dating back to the 1860s, and their staff is more than happy to share that with you. The wines were both diverse and high quality, yet not overly expensive. They also put on events and concerts, something that just makes you dying to come back.

LangeTwins (3/5): Again, a larger scale production winery than we expected. However, they seem to have maintained that “family owned” mindset, which was clear by the service you got in the tasting room. The staff was quite happy to answer questions, and was definitely passionate about their work. LangeTwins also had the best “varietal diversity” of the weekend — balance of red and white wines on their tasting menu. When wine tasting with people with different preferences, this is important.

Other wineries we visited that weren’t in the Top 5: We also visited Twisted Barrel, Dancing Fox, and Wine Social (which was pouring for 6 Hands). These wineries were good, but did not outshine the wine from the Top 5. We did hear great things about Jeremy Wine Co throughout the weekend (which was not in our list), but the timing didn’t work out to visit.

Key learnings

Data quality is important. Anyone who knows data agrees with this point, but until you have subject matter expertise — it can be hard to know if your data is accurate and clean. For example, the data source I used had 33 wineries from Lodi; but in reality there are 80+ wineries in Lodi. The varietal diversity measures and ratings were also not very accurate data points.

It’s easy to over-index on one attribute. Wine ratings are important, but due to the fact that my source for wine ratings was limited (only Wine Enthusiast ratings) and many of the wineries in Lodi don’t have those ratings specifically, our top ranked wineries were those that had wine ratings. And those with no ratings, essentially had zero values for 4 out of 10 criteria (all of the rating related ones).

There are important criteria that are harder to measure. As shown with the differences between data and reality, there are definitely nuances in experience that is hard to quantify for the purpose of ranking experiences. Michael David is the most explicit version of that — the feedback loop of its’ commerciality are definitely the reasons that it was ranked in the top.

--

--