TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

What cluster might we find this restaurant in? Let’s find out! (Photo by JP Holecka on Unsplash)

Clustering Restaurant Areas in Vancouver, British Columbia

Through topic modeling and clustering algorithms in Python, I display the geographic patterns in cuisines found in Vancouver’s various neighborhoods.

8 min readMar 26, 2021

--

As a proud Vancouverite, I often mention where I am from to the people I’ve met while living in the US. Outside of mixing up which coast Vancouver lies on, unfamiliar folks commonly tell me that they have only visited the city to depart on a cruise to Alaska. Vancouver, however, is much more than Canada’s western port. Travel guides proclaim Vancouver as a place known for its natural beauty, memorable outdoor activities, and diverse communities. However, no description of Vancouver would be complete without mentioning its vibrant food scene:

Don’t tell the rest of the country but Vancouver is Canada’s dine-out capital. Abandon your diet and dive right into North America’s best Asian cuisine, from chatty Chinese restaurants to Vietnamese banh mi sandwich joints, or unleash your appetite on a rich smorgasbord of fresh-caught local seafood, including seasonal spot prawns and succulent wild salmon. — Lonely Planet

But without travel guides, how else can we find ways to describe Vancouver as a culinary destination? The answer, of course, lies with data.

Motivation:

I was interested in exploring Vancouver through the lens of its many ‘culinary neighborhoods’— after all, when people say they live in Vancouver, they really mean they spend a good chunk of their time in the areas they dine out. As with most cities, Vancouver’s restaurants tend to cluster in areas of high foot traffic (an economic explanation of this can be found here).

Street view of Robson St. showcasing four restaurants of different cuisines
A Google Maps street view of Vancouver’s Robson St. We have at least four restaurants in one block alone! (Image screen-captured by Author)

But given Vancouver’s breadth of cuisines to dine at, I became more interested in where specific types of cuisines tend to be located, and whether there are any patterns in the ways restaurants tend to group up together. Through the applications of topic modeling and clustering algorithms, this blog aims to answer two questions:

  1. Are certain types of cuisines more represented in some areas than in others?
  2. Can we group areas in Vancouver based on their most popular cuisines?

This analysis can be used by travelers to identify areas of interest when choosing a specific cuisine to dine out at, but could also mark areas of entry for a restauranteur of a specific cuisine who is interested in avoiding areas that might already be saturated with that cuisine.

Data Collection and Preparation:

To answer these questions, I used data produced by Yelp, identifying around 4,000 businesses under the restaurant category that were located within Vancouver’s local area boundaries. I analyzed three primary features from my Yelp restaurant data:

  • Geographic location
  • Cuisine type from Yelp-defined restaurant tags
  • A Bayesian Estimate of the Mean Star Rating. This is a modified measure of the 5 star rating identified by Yelp that is used to reduce the noise created by businesses with a low number of reviews. The formula I used to calculate this measure can be found on this website
Table of Yelp Data for Dinesty Restaurant in Vancouver, BC, showing latitude, longitude, restaurant tags, and star rating
A sample of my data, and a pretty good place for juicy pork dumplings, if I say so myself… (Image by Author)

The first step of our analysis is to identify labels for our restaurant data that groups restaurants into areas that people would visit and easily walk around in. I dub these ‘culinary neighborhoods.’ I start by looking at where restaurants are located within Vancouver’s 22 neighborhoods as defined by its city government:

(Image by Author)

There are visible areas where restaurants tend to cluster, but the neighborhood boundaries provided by the city raise two issues with identifying our culinary neighborhoods:

  • We see that restaurant clusters occurring at the boundaries of the city’s neighborhoods would be split up arbitrarily.
(Image by Author)
  • We also see that larger neighborhoods, like Fairview, clump a large number of restaurants together and do not capture smaller, more walkable areas within the neighborhood.
(Image by Author)

I avoid using the city’s labels to identify my culinary neighborhoods and instead generate labels using a KMeans clustering algorithm on the restaurant’s location. This algorithm has the effect of labeling restaurants based on their location after I specify the number of labels (the K in KMeans) that I expect to have in my data. Using scikit-learn’s silhouette score as the metric to optimize the quality of my labels, I split my data up into 123 culinary neighborhoods. This produces the following updated plot of our restaurants, with colors signifying the different labels for our culinary neighborhoods:

(Image by Author)

Data Modeling

Now that we’ve identified our culinary neighborhoods, we can start to identify patterns for the cuisine types found in these neighborhoods. I outline my data modeling process as follows.

  • I start by representing each neighborhood as a vector of the counts of its different cuisine types. This gives us an initial measure for us quantify my neighborhoods and identify differences among them.
  • To incorporate information on the quality of restaurants, I further adjust each count of a cuisine type based on the neighborhood’s average rating for that cuisine type relative to the city’s average.
  • My data show that there are 332 different cuisine types, making it difficult to determine what patterns may exist in our data. I therefore apply an NMF algorithm on my data to describe each neighborhood as a collection of cuisine groups instead of many individual cuisine types — a process referred to as Topic Modeling (more on this below). To briefly describe this process, the NMF algorithm uses information on cuisine types that commonly occur within neighborhoods and creates cuisine groups (the topics) made of the distribution of individual cuisine types; the algorithm then produces a representation of each neighborhood based on its weighting in each cuisine group.
(Image by Author)
  • Having transformed each neighborhood into collections of cuisine groups, I apply another KMeans clustering algorithm to cluster neighborhoods based on their weights for each cuisine groups. This will determine what patterns we might see in the cuisine groups represented by each neighborhood.
(Image by Author)

Topic Modeling Vancouver’s Cuisines

After trying different number inputs for the NMF algorithm on my data, I decided on modeling four cuisine groups. Displaying the top 5 cuisine types by their weighting in each cuisine group reveals some pretty distinct themes.

Mean Category Count also shows a general correlation between the occurrence of each cuisine type in the city with its weight in each cuisine group (Image by Author)

Group 1 (Bars/Western) is characterized heavily by the occurrence of bars, along with Western cuisine types like sandwiches, seafood, and cuisines from Canada and the US.

For Group 2 (Chinese/Vietnamese), we see a heavy weighting for the Chinese cuisine and other associated tags like Dim Sum and Seafood. This is unsurprising to see as a cuisine group in Vancouver, given the city’s large Chinese population. However, we also see a sizable weight for the Vietnamese cuisine implying that these cuisines occur together in similar locations.

Group 3 (Japanese/Korean) shows a similar trend, being heavily weighted by the Japanese cuisine while also having a sizable weight for the Korean cuisine.

Lastly, we have Group 4 (Casual/Cafe), which has coffee & tea, cafes, brunch, pizza, and sandwiches characterizing this cuisine group.

Clustering Vancouver’s Neighborhoods by Cuisine Group

We now turn to look at how our second KMeans clustering algorithm performed on clustering our neighborhoods based on cuisine group weights. Here I chose K = 5 based on identifying the biggest reduction in a loss metric, inertia (see section 2.3.2 of scikit-learn’s clustering documentation).

Below, I provide boxplots showing the range of weights that neighborhoods of a particular neighborhood grouping have. With the exception of Neighborhood Label #5, each of these clusters of neighborhoods appear to show a strong weight towards one of the four cuisine groups I modeled as shown by high weights from each neighborhood for a particular cuisine group.

(Image by Author)

Results

After all that modeling, we can finally plot our resulting neighborhood labels on a map of Vancouver where we can find areas weighted towards a particular cuisine group. The legend for each neighborhood grouping is as follows:

  1. Green: Weighted towards Bars/Western
  2. Yellow: Weighted towards Chinese/Vietnamese
  3. Purple: Weighted towards Japanese/Korean
  4. Red: Weighted towards the Casual/Cafes
  5. Blue: Not weighted towards any cuisine group
(Image by Author)

In general, many areas of Vancouver aren’t dominated by a particular cuisine group until you start moving into the residential areas or into Vancouver’s downtown. Here are some highlights I take away from this mapping:

  • In the orange circle, we see that Vancouver’s Downtown is dominated by the Bars/Western cuisine group with the green coloring, understandably so because it is associated with the nightlife in Vancouver. However, we also see a number of pockets that are labeled for the other cuisine groups, showing not only how diverse Vancouver’s dining options are, but also that there are areas of intensity for the Japanese and Chinese cuisine groups. These pockets of saturation can locate meaningful areas like Vancouver’s Chinatown, signified by the yellow cluster on the east side of Downtown.
  • The golden circle represents a discovery I made while reviewing the data. Prior to my analysis, the only areas in Vancouver known to me that were saturated with Japanese or Korean restaurants were all located in Downtown. However, investigating the label on this culinary neighborhood revealed an area dominated by Japanese restaurants that I wasn’t previously familiar with. I took a look at the top three Japanese restaurants by rating and found that they also have some pretty good reviews! I’ll link them here for reference: Saku, Uma Sushi, and Marulilu Cafe.

Concluding Thoughts

My project revealed that Vancouver’s restaurant scene could be modeled into four broad cuisine groups, three of which skewing heavily towards the Bars, Chinese, and Japanese cuisine types. Plotting the different cuisine groups on a map showed that there are areas where certain ethnic cuisines dominate, identifying multiple areas of the city that one could seek for a particular cuisine.

I plan on looking into ways I could combine other geographic and demographic features within Vancouver to see if I could create a recommendation system for restaurants based on area a user might be drawn to.

My final labeling of the neighborhoods seems to line up with my knowledge of the city. Does it line up with yours? Let me know in the comments if you agree or disagree!

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Warren Lee
Warren Lee

Written by Warren Lee

An excitable nerd with a passion for all things economics, data, and interactive storytelling - 4.5 years in econ consulting; 1.5 years in data and analytics

No responses yet