Python-generated word-cloud illustrating most common cuisines in Toronto

Segmenting the Toronto Restaurant Market: Lending an Analytics Hand to a Distressed Sector

Motive

Abtine Monavvari
The Startup
Published in
9 min readSep 9, 2020

--

The Toronto restaurant landscape is a bustling set of culinary traditions that is as rich as it is diverse. No surprise for a city touted as the most multicultural in the world. It isn’t hard to romanticize either; these are the places where Torontonians celebrate occasions, catch up with old friends, close important deals, sustain themselves on the go, and simply enjoy some of the best and most unique food offerings anywhere.

You could probably imagine then that the Toronto restaurant community is very near and dear to my heart. So, when the COVID-19 pandemic sent the sector teetering on the brink of catastrophe, I was compelled to lend whatever hand I could to help preserve the source of so many of my fondest memories. While the reality is that many doors have already been forced shut (a trend I anticipate will continue as the respite afforded by patio season draws to its end), I decided to put my skills to work and share some useful insights.

The following data science project is the result. It aims to guide restaurants’ recovery strategies through a broad overview of the market. More importantly, it hopes to inspire the data science community to lend their skills to this cause and further build on it. For any of my data science friends looking for their next side-project, consider this an open invitation to collaborate. As you’ll see, there are many worthy data science problems that remain to be explored within the scope of this project — check out my Github repository for more details.

With that said, I want to keep this article as approachable as possible so I’ve made every effort to annotate the technical nitty gritty in order to keep things a little more intuitive. So let’s get started!

Prompt

As alluded to in the title of this article, I set out to segment the Toronto restaurant market into distinct groups of more or less similar restaurants. The purpose of this is to help restaurants gain deeper insights into their respective competitive niche which they can use to parse out relevant trends from noise. In other words, find out how similar restaurants have responded to the pandemic by ruling out dissimilar ones. Along the way, I also wanted to explore some of the more salient features of the restaurant sector with particular attention given to neighbourhoods and the kinds of restaurants they play host to. To wrap things up, I consolidated my findings into an intuitive and interactive map — this will be the “deliverable”.

Data Requirements

You can’t really apply fancy analytics without data so let’s tend to that now. From a market segmentation perspective, we want to collect the most relevant features on as many restaurants as possible within Toronto. Relevant features might include:

  1. Price
  2. Average Review
  3. Number of Reviews
  4. Cuisines
  5. Location

A structured and readily available data set containing all of this information will likely be hard to come by (which I can now confirm was the case), so we’ll probably need to leverage sources of unstructured data such as online review forums and catalogs.

Solution

For the most part, this data can be scraped from the web by building a crawler which will essentially extract information from a set of web pages. APIs offered by Google and Foursquare also offer solutions but I chose to go with the former (freer) of these options. When scraping, we’ll want to keep the scope to a single website for the sake of consistency. So, we’ll look for a single platform (website) with an extensive and rich library of Toronto restaurants.

Lay of the land

After a couple hours of scraping, we now have data on approximately 5,000 restaurants in the Greater Toronto Area. The first thing I want to do is to get familiar with it. To that end, there is no good substitute for good old-fashioned exploratory data analysis. I go in greater depth in my Jupyter Notebooks but for now, I’ll describe the data and visualize some of its more interesting features.

Data:

After considerable wrangling, my data looks a little something like this:

Sample of restaurants data

Some Interesting Stats:

The first thing I wanted to find out was how are the various types of cuisine distributed? Sticking with the food theme, I made a waffle-chart to get a sense of the 10 most common types of cuisines in the city and how they stack up against the rest.

Waffle-chart of Toronto restaurant distribution by cuisines

As we can see, Asian food is by far the most common kind of cuisine with 15.56% of all restaurants cooking up Asian dishes. This is followed by Bar Food (8.03%), Italian (6.84%), and Cafes (6.09%). It’s important to note that restaurants can produce several different cuisines concurrently (which aren’t necessarily mutually exclusive eg. Japanese and Sushi). For instance, a restaurant might be labelled Italian, Pizza, and Desserts. That said, it’s interesting to see that 49.32% of all restaurants in our sample do not belong to the group of top 10 most common cuisines — signalling a rather diverse landscape.

Next, I was curious to see if any of my quantitative variables were correlated with one another. For example, do more expensive restaurants tend to get better ratings?

From the correlation matrix on the left, it looks like (on average) there aren’t really any meaningful relationships between the number of reviews, average reviews, and the cost for 2. But of course, this is only looking at the aggregate of our data and there may be more pronounced relationships for certain groups of restaurants. For example, it may turn out that more expensive fine dining restaurants garner better reviews than less expensive fine dining restaurants. This could be an interesting thing to look at in the future.

Segmenting the Market (clustering)

Now for the main course: segmenting the Toronto restaurant market.

Disclaimer: This section gets a little technical. I break things down into plain English in the last paragraph so feel free to scroll there for the punchline.

Market segmentation is, of course, an unsupervised learning problem (i.e our learning algorithm trains on unlabeled data). Clustering algorithms such as k-means and k-modes are all tools we might want to apply to group our data into clusters of more or less similar restaurants. Each has its limitations, and the appropriate choice will ultimately depend on the data.

Our data has the particular property of being both numeric (average review, review counts, average cost, etc) and categorical (type of cuisine, occasion, etc). The first step we’ll want to take is to one-hot encode our categorical variables to obtain dummies that we can pass to our clustering algorithm.

Now that we have dummies, let’s take inventory. Most of our data is categorical and presently coded as dummies (which follow a binomial distribution).

Restaurant data with one-hot encoded categorical variables

In fact, we can see that 232 out of our 235 features are categorical.

Since k-means aims to minimize the euclidian distance between data points and cluster centroids (while maximizing euclidian distance between cluster centroids), it generally performs poorly on categorical data which belong to the discrete set {0, 1}. The k-modes algorithm is better equipped for categorical data, but conversely does not handle continuous data very well.

PCA:

To work around these constraints, we can reduce our data into its principle components through Principle Component Analysis (PCA). This should yield approximately continuous data across board (though it does sacrifice some information). Given the sparseness of many of our dummies (some dummies are “hot” in less than 5% of instances), we should intuitively be able to reduce our data quite a bit without sacrificing too much explained variance. We will then be able to pass it directly to our k-means algorithm in the form of a feature set.

Running PCA, and plotting the cumulative explained variance on the number of components, we see that roughly 100 components explain 95% of the overall variance in our data. So, we’ll reduce the dimensionality of our data into a feature set of its 100 principle components. In the end, our data goes from looking like this:

Sample dataframe before PCA

To something like this:

Sample dataframe after PCA

Setting parameters (n_clusters):

We now have a feature set that is ready to be passed to our clustering algorithm (k-means). However, before we can do that, k-means requires us to predetermine the number of clusters we want to segment our restaurants into. Since we don’t have a priori knowledge of how many major segments make up the Toronto restaurant market, we’ll need another way of informing our choice of cluster number. The proposed solution, though admittedly a little more art than science, calls for iteratively running k-means for incremental numbers of clusters and plotting the corresponding sum of squared errors (SSE). Since the SSE should generally decrease monotonically for every increment of cluster number, we can’t simply minimize it without grossly overfitting our model. Instead, we’ll look for an inflection point where the rate of change in the SSE begins to really taper off (elbow method).

From the plot on the left, it looks like the rate of change starts to peter out at around 15 clusters so we’ll set the n_cluster parameter equal to that and run our clustering algorithm on the previously defined feature set.

In Plain English:

As mentioned earlier, our objective was to segment the Toronto restaurant market into groups of more or less similar restaurants. We did this by applying a machine learning algorithm (k-means) which essentially combs through our restaurants, looks for patterns, and assigns each restaurant to a group of other restaurants by determining which it most resembles. In our case, we determined that there were roughly 15 major segments in the Toronto restaurant market so each restaurant was assigned to one out of 15 possible groups (clusters).

Finally, I created a table highlighting each group’s main features:

Looks like our model did a decent job of clustering restaurants into more or less consistent groups, though it would certainly improve with better data.

Interactive Map

Now that I have my clusters, I put everything into an interactive map.

How it works:

The map contains several layers — each displaying different insights — which can be accessed by hovering over the layer control icon (in the top right corner) and then selecting a mode by clicking. Some of the modes contain additional interactive features which allow you to access deeper insights by either hovering or clicking on clickable objects on the map.

Layers:

  1. openstreetmap: Simple map initiated to display Toronto.
  2. # Restaurants: Restaurant count (from data sample) by neighbourhood
  3. Average Price (for 2): Average restaurant price for 2 people by neighbourhood
  4. Average Rating: Average restaurant rating (out of 5) by neighbourhood
  5. HeatMap: HeatMap illustrating restaurant density
  6. Granular: Individually plotted restaurants (clickable)
  7. Granular (colour-coded by similarity): Individually plotted restaurants colour-coded by cluster (clickable). Each cluster (colour) represents a group of more or less similar restaurants.
Interactive map of Toronto restaurants. Click on layer-controller (top-right) to access features

And there we have it! An interactive map of of Toronto restaurants.

Future Direction

There are countless other interesting areas I would have liked to explore, but ultimately forwent in the interest of keeping the scope of this project manageable. Some of these included determining the factors driving reviews (regression), determining which restaurants face a heightened risk of closure (classification — though this would require more data), determining optimal street closures to accommodate increased outdoor seating (optimization), and many more. I leave these to the data science community at large.

--

--