How To Build The World’s Largest Wine Database

Just add wine


Over the past 4 years, Vivino has grown into the largest and most comprehensive wine database in the world.

50 million photos of more than 3.5 million wines, 5 million users bringing more than 2 million wine reviews and 10 million ratings, all collected neatly in 1 database, available for all to browse. This is the story of how we did it.

Like Foursquare has mapped out the physical locations of the world and placed them in one database for all to explore, so has Vivino digitized the physical world of wine. It started in 2010 with a three-page Word doc.

Go ahead, build a database of all the world’s wines — how hard can it be, right? Had we known just how hard, we might have chosen another line of work. As it happens, we had no idea, and went ahead... Four years later, we’re still at work, adding hundreds of wines every day to an increasingly complex and deep database. And it’s a moving target: Every year, the wine producers of the world add around 300,000 new wines to the mix — most of them are “just” new vintages of existing wines, but even so they still represent a new wine to be known.

How to eat an elephant

When we started Vivino, we had little money in our pockets and no funding. Fortunately, the task of building a wine database is excellently suited for outsourced and low-cost work. This is how we got started:

  1. Googled our way to 1,000 websites of wineries. Put these urls in a Word doc.
  2. Built a simple backend for a non-existing datateam to work with. Created a simplified structure of Country — Region — Subregion — Winery — Wine — Vintages, and the interface to add data into that structure
  3. Hired a guy from India (Mahendra, who to this day is the backbone of our large datateam, and doing an amazing job), to visit the websites one by one and enter their wine data into our system. No scraping, just manual input.

This gave us the starting point. Once the first 1,000 wineries were covered, we just kept going, and going and going. Which was all nice and well, but how then take the jump from a plain, factual database about wine, to a photo-recognizing app with loads of user-contributed content?

Wine Selfie

The original concept for Vivino was hatched in 2009 and had nothing to do with an app. We wanted to create the “iMDB for Wine”, the one place to go to find out what other people thought of wine, not just the fancy critics and experts. As the database grew, we realized that this would be rather perfect for an image-recognizing app. Challenge was that we had ZERO photos of any of these wines, a prerequisite for doing any kind of image recognition.

Image Recognition 101: Today, Vivino uses OCR (a.k.a. the automated “reading” of text on labels) to refine the search result. But the basis of recognition still comes from pure pattern-recognition, i.e. comparing the incoming photo with the millions we have in our database, and find the closest match.

One of the good things about wine is that, although there’s an amazing amount of wines out there, there’s also a “short tail” of relatively few wines which are very popular. We knew that if we could capture the top 20% of wines out there, we’d also be able to make a product that would automatically match at least 60% of the wines that users would scan.

So, we built an app for that. Not the Vivino app that you know today, but an extremely simple competition app.

Vivino Label Competion 2010

We invested in 2 Laguiole corkscrews, and offered them as a prize in the simplest competition ever: Use our bespoke app to send in photos of wine-labels and win this crazy corkscrew that you’ll never invest in yourself.

It was a surprising success, and within a month we had 50,000 labels to get us started. We also built a new interface for our datateam to work with, one that would enable them to match an incoming photo to a wine in our database, creating the backbone of our image recognition as it is today.

An ugly duckling gets shot in the App Store

We launched the first Vivino app in late 2010, with a big BETA sign plastered over it. Even so, it sucked — we knew it and our users knew it. When scanning a wine, our early adopters would have around 20% chances of getting an automatic match (compare that to 92% today) and even when they got a match, we couldn’t really tell them anything interesting about the wine, apart from the stuff they could already read on the back label…

There’s an MO in the app world today that apps must be polished and perfect to stand a chance. Ours was slow, ugly and worst of all — it didn’t quite work. So why even launch?

We were faced with a classic Catch-22. We knew we wouldn’t be able to recognize wines without having photos of them — and we couldn’t get photos without users, so we had to put out a half-baked product if we wanted to get anywhere. With the photos we did have from the label competition, we expected a match-rate of around 20%, which also turned out to be true. It’s all a matter of managing user expectations, but we failed in that and the ruling in the App Store and Google Play was harsh: We had a 1.5 star average rating shortly after launch. But there was a silver lining.

All the bad reviews had one thing in common:
“Great idea, too bad it doesn’t work”

First of all, we knew that the problem we were facing would slowly be alleviated: We could see the automatic match rate climb slowly but steadily. Secondly, all the bad reviews had one sentiment in common: “Great idea, too bad it doesn’t work”. As the optimist will tell you, that’s a confirmation that you’re on to something and need to keep working! Today, Vivino has a healthy 4.5 stars average rating in both Google Play and Apple’s App Store.

Welcome, User Dave

As 2011 went by, we also saw our automatic match rate increase — as more and more photos were matched with wines in our database, the app was able to automatically match 20%, then 30, 40, 50, 60% of the incoming labels automatically and instantly.

That also meant that Vivino increased in popularity. During the past three years, we’ve seen more and more user coming in, slowly adding the last magic ingredient to our database: User content. Ratings, reviews, prices, places selling wine — all this was submitted to the database by our users, lifting it from the mundane level of factual data, to a vibrant, user-driven wine-brain.

Suddenly, Vivino was actually truly useful. You could scan a wine, get an instant match and see what other people thought of the wine.

Suddenly, Vivino was actually truly useful. You could scan a wine, get an instant match and see what other people thought of the wine. This was the edge that we’d been looking for all along, and the aspect that now gives Vivino a decisive advantage: The catch-22 has been reversed into a positive spiral, with more users creating more content for even more users to enjoy.

We’ll never get done

Today, we have 200,000 wine labels coming in every day from our users. Even with a +90% automatic matchrate, this still leaves you with around 20,000 images to manually match every day. Enter the datateam, that started out with Mahendra and a word-doc in 2009, and is now more than 50 people working around the clock to match labels with wines. With every match they make, they also increase the precision and width of our detection technology.

Foursquare also started out as a gimmick: Why would I “check in” to places, just to get a badge? Now it’s evolved into giving you more content and value than you provide. The same dynamic is happening with Vivino.

We’re still working on the final piece of the puzzle: Enabling our users to buy the wine. Fortunately, our business model is also our most requested feature: When we recently asked our users what they’d like us to add to Vivino, the most requested feature was the option to buy wines directly from the app. We like the prospect of that challenge :-)

Email me when Theis Søndergaard publishes or recommends stories