What is a Michelin Star?
For each year since 1936, Michelin has updated its guidebooks to award or deduct up to 3 stars for each restaurant it inspects. The inspection process is shrouded in secrecy to avoid conflicts of interest and stars are scarcely awarded. The stars have since become one of the most recognized and respected awards a restaurant can achieve, with thousands of chefs from all corners of the globe devoting their lives to the pursuit of just a single star.
In fact, Michelin-starred cooks take the stars so seriously that renowned chefs, such as Gordon Ramsay, have wept over the loss of one or more stars, with some even taking their own lives.
And it’s no wonder. A single star can have customers flocking to that designated restaurant, bringing about both immense amounts of fame and profit. The late and great Joël Robuchon, who himself had an absolutely astounding record of 32 Michelin stars, once said that “with one Michelin star, you get about 20 percent more business. Two stars, you do about 40 percent more business, and with three stars, you’ll do about 100 percent more business.”
Customers and investors alike are therefore very keen to predict which restaurants eventually earn a Michelin star before the competition gets fierce.
But how rare are they exactly? Shouldn’t they be easy to pick out?
Not exactly. Out of almost 1600 popular restaurants in New York City, 47 have one Michelin star, 11 have two, and only 5 have three, making the prediction of a restaurant having even one star a technical challenge.
So then how are Michelin-starred restaurants selected then? Is there a noticeable pattern between each of them?
I began by scraping TripAdvisor for New York City-based restaurants and performed some exploratory data analysis, hoping for a clue as to what variables could contribute to the obtaining of a Michelin star.
None of the variables above are strongly correlated with the Michelin.Star
and/or the Have.Star
variable, which indicates whether a restaurant has 0, 1, 2, or 3 Michelin stars and at least 1 Michelin star respectively. This should come as no surprise since the awarding of a star is so rare.
What’s more, plotting the Overall.Rating
against the number of Michelin stars on a boxplot shows no significant pattern that suggests a relationship between the two variables. Furthermore, the population size for restaurants with 2 or more Michelin stars makes a conclusive statement infeasible.
For restaurants with at least one Michelin star, the median overall TripAdvisor rating appears to be the same for both boxplots. There’s nothing particularly outlandish about any of the observations. Also, note that the boxplot with restaurants having at least one Michelin star has a wider spread since the population size is sparse.
Sentiment analysis shows that there’s a positive trend between Michelin-starred restaurants and positive sentiment, with the average 0-star restaurant receiving lukewarm reviews and the average 1-star restaurant receiving those that are generally favorable.
However, the correlation matrix displayed earlier shows that there’s no strong correlation between sentiment and Michelin stars, so the above barplot, while interesting, is likely just wishful thinking. The word clouds below demonstrate the relative similarity of TripAdvisor wording between the average restaurant in New York City and restaurants with Michelin stars.
Most notable, however, is the fact that there are two major features that separate a Michelin-starred restaurant from a regular NYC restaurant. For starters, while American cuisine dominates restaurants without any stars, the same cuisine is placed 5th among restaurants with at least one star.
In addition, restaurants without Michelin stars are vastly casual compared to those with at least one star.
Again, while these observations are nothing outside of common knowledge, they suggest that these patterns could be useful for the eventual predictive model to pick up when identifying Michelin-starred restaurants.
Using Machine Learning to Predict which Restaurant will receive a Michelin Star
Data Aggregation and Wrangling
I aggregated all the data using Chrome’s Web Scraper extension on TripAdvisor in New York City and the Michelin Guide NYC, wrangling them using R.
The following variables were web scraped entirely from TripAdvisor: Cuisine, Overall.Rating, Dining.Type, sentiment, Review.count, Value.rating, Service.rating, Atmosphere.rating, years.with.tripadvisor.award
and Food.rating
.
I scraped Michelin.Stars
directly from the Michelin guide and created the Have.Star
variable by creating an if-else statement that outputted 1 if the restaurant has at least one Michelin star and 0 if not. This was done to create a binary response variable that would be easier for logistic regression and random forest models to handle.
About the variables
I split Dining.Type
into two categories: fine and casual dining. If the restaurant listed $$$ or above on TripAdvisor, then I categorized it under fine dining. Anything $$ or below would be considered casual dining.
To obtain the sentiment
variable, I ran a sentiment analysis in R on 30 randomly selected TripAdvisor reviews from each and every restaurant that has at least 1 review. This not only simplifies the dataset but is also appropriate for improving the models' predictions on the condition that Michelin doesn’t normally review unpopular restaurants.
The years.with.trip.advisor.award
is a continuous variable, which is a reflection of restaurants that have been recognized by TripAdvisor for at least 1 year.
Since TripAdvisor’s overall rating is not an entirely accurate representation of people’s ratings, I created a new weighted-average approximation, Overall.Rating
, using reviews separated by the number of stars awarded.
Preparing the dataset
I built a couple of models to predict whether a restaurant will receive/deserves at least one Michelin star. To start, I began by randomly selecting a training dataset with 75% of the original data and variables. The remaining 25% of the data would later be used as part of the holdout sample.
Random Forest Model
I constructed a random forest model on the training data before then applying the prediction model onto the holdout sample. For the random forest and logistic regression model, I set Have.Star
as the response variable since I wanted the model to both identify which restaurants would have at LEAST one star and to also allow for a binary response, which would, in turn, be easier to fit around.
By tuning the model, I found the optimal number of variables randomly sampled at each split to be around 50. The output produces nothing too unusual. The out of bag error is thankfully not too high, meaning the model is quite solid.
The random forest model produces an AUC of 0.84, resulting in a model that is very good at discerning between restaurants that have and don’t have Michelin stars.
Furthermore, the variable importance plot reveals that Cuisine
is far and above the most important variable in the dataset, followed by Overall.Rating
and Dining.Type
.
Logistic Regression Model
Next, I built a logistic regression model, again with Have.Star
set as the response variable. I used LASSO with 10-fold cross-validation to determine the desired sparsity of the model.
Using the regularization output’s lambda minimum, I included 4 statistically significant variables into the predictive model, which were Cuisine, Dining.Type, Value.Rating
andReview.count
. The lowest AIC obtained was 237.56, implying that this model lost the least amount of information out of all others.
The AUC of 0.98 on the training data indicates a model that is extremely good with its predictions on the training data.
On the other hand, the AUC of 0.74 on the testing data shows a much weaker model than anticipated. The AUC is still good, but not excellent like in the training data.
The training data’s confusion matrix is built on a conservative threshold of 0.65, which results in a precision of 0.91.
Applying the conservative threshold on the testing dataset reveals high precision but extremely poor recall. This is not necessarily a bad thing, as a model with high precision would result in fewer false positives, which, in this case, would be beneficial to low-risk investors.
To build a model that has a high recall, I set the threshold to 0.17, which, through weighted-classification, is equivalent to establishing a threshold as if a false negative prediction was 5 times more costly than a false positive. A model with a high recall would be best served to those who are searching for the next restaurant to earn a Michelin star.
The recall-based model does okay on the testing dataset, with a recall of around 0.455. Also note that the misclassification rate, while often relevant, is not extremely useful here since correctly classifying Michelin-starred restaurants is more important.
Overall, based on the AUCs, it’s quite apparent that random forest generated a model that’s much better than the one the logistic regression built, so it’s worth placing more weight on the former’s predictions when looking for a lower misclassification rate.
So which restaurants should we be on the lookout for? Which restaurants are on the verge of gaining a Michelin star?
Precision-focused Logistic Regression Model Result:
According to the logistic model built around precision optimization, the following restaurants are extremely deserving of at least one Michelin star:
This type of prediction would be useful for risk-averse investors looking for a safe expected return on investment when looking for the next Michelin-starred restaurant. This particular model strongly suggests that Gran Tivoli, Perrine, and Sushi Ishikawa each deserves at least one Michelin star despite not having one already.
As expected, the three aforementioned restaurants all receive highly positive reviews and are categorized under fine dining. I would not be surprised if any of these restaurants are awarded a Michelin star in the coming years.
Recall-focused Logistic Regression Model Result:
The recall-focused model captures most of the restaurants with at least one Michelin star and can be interpreted as the model that represents Michelin’s standards the most. In other words, the following restaurants don’t have Michelin stars but are most similar to those that have one or more.
Since the main focus of this project is on predicting which restaurants will potentially earn a Michelin star, I’m more interested in the model’s false positives. As such, the table below is NOT representative of the model’s predictive abilities as all the true positives have been filtered out. The same approach applies to the random forest model.
Note how Gran Tivoli, Perrine, and Sushi Ishikawa show up here as well. This is expected since relaxing the threshold would include the restaurants from the conservative model and more. The list shown above would be most suited to foodies looking for a Michelin star experience without having to deal with as much demand.
Random Forest Model Result
The random forest model can be interpreted as a more accurate representation of the logistic regression model that was more focused on recall. In this case, fewer false positives are shown, giving a stronger sense of what a Michelin-starred experience is like without actually owning a star.
This time, only Gran Tivoli, which appears in both the precision and recall model, is predicted to earn a star, not Perrine or Sushi Ishikawa. This means that Gran Tivoli is the only restaurant to appear on all three models, making it a strong contender to eventually obtain a Michelin star.
Closing thoughts
It would be an extrapolation to apply these models to cities dissimilar to New York as the model was trained on specific parameters and potential confounding variables unique to the city. For example, New York City’s Japanese restaurants might be much better than some other places, so if the models were to be applied to a city such as Detroit, they would likely not fare as well given its heavy dependence on the Cuisine
patterns they picked up in NYC.
Nevertheless, the models can still likely be applied to cities with a restaurant ecosystem similar to that of New York.