Location Optimization for Establishing a New Chinese Restaurant in Vancouver

Yan Houng
The Startup
Published in
10 min readNov 28, 2020
Photo by Mike Benna on Unsplash

1) Introduction :

Vancouver city is one of the big cities in Canada located in the west. Vancouver CSD (Census Subdivision), Vancouver city consists of 22 neighborhoods. Besides, Vancouver city is a multicultural city. It is formed by a mix of people who are of different races, having different religions, ethnicities, and cultural.

2) Business Problem :

There are 167180 Chinese people staying in Vancouver city. Thus, this makes establishing a Chinese restaurant a good choice of investment. As an investor, it is always important to find an optimal place to establish a Chinese restaurant. In this case, the neighborhood of Vancouver CSD needs to be scanned through to identify establishing a Chinese restaurant in which area will higher business opportunities and lesser competition. This project is targeted to investors who would like to establish a new Chinese restaurant in Vancouver CSD.

3) Data Sources :

In order to establish a Chinese restaurant in Vancouver CSD, we will be scrapping and getting the data from the following :

A) City of Vancouver census local area profiles 2016:-https://opendata.vancouver.ca/explore/dataset/census-local-area-profiles-2016/information/

Example of data: A1) There are 22 neighborhoods in Vancouver CSD.

A2) There are 3045 Chinese people staying in the Arbutus-Ridge neighborhood.

B) Latitude and longitude of the neighborhood in Vancouver city can be gotten through the usage of python library Geocoder. Refer to the following link for more information:-

https://geocoder.readthedocs.io/

Example of data: The latitude and longitude of the Arbutus-Ridge neighborhood are 49.246305, -123.159636.

C) Foursquare API to explore the famous venues especially Chinese restaurants in the neighborhoods of Vancouver CSD. One will need to register her developer account from the following URL in order to access Foursquare API.

https://developer.foursquare.com/

Example of data: To explore the Arbutus-Ridge neighborhood and identify the number of Chinese restaurants.

4) Data Pre-Processing :

4.1) Data Wrangling / Data Cleaning :

City of Vancouver census local area profiles 2016 data

The data downloaded for City of Vancouver census local area profiles 2016 consists of all the Vancouver city data such as people from different age groups, household income, total visible minorities like Chinese population and etc for all the local areas (a.k.a. neighborhoods) in Vancouver CSD. Therefore the data to be focused on need to be filtered out. The data to be observed are the following :

  • The 22 neighborhoods in Vancouver CSD
  • The total population in each neighborhood
  • The Chinese visible minority in each neighborhood
  • The household income for each neighborhood (mean value and median value)

Thus, the ID number of the above information and located the information are identified in the data downloaded. Then, the data can be extracted out and put into a data frame shown below :

The data frame showing all the data to be focused on. (Showing only 10 of the neighborhoods)

A side note for the household income data. Median household income is taken into consideration in this case as the mean household income is higher than median household income. This means that the household income data is skewed. Taking median instead of mean value will be more appropriate.

In order to get the latitude and longitude information, geopy.geocoders.Nominatim package is used. After that, the data frame is updated with the latitude and longitude information.

The data frame after added in latitude and longitude. (Showing only 10 of the neighborhoods)

4,2) Using Foursquare application programming interface (API) :

There is another important data that we need to gather and that data is about the number of Chinese restaurant in each neighborhood of Vancouver CSD. In order to get this data, the explore function in the Foursquare API will need to be used.

By using the explore function in Foursquare API, each neighborhood in Vancouver CSD is screened for the popular spots around the latitude and longitude information. In this case, the radius has been set to 1.5 km while the maximum limit of popular spots is 100. The 2 tables below show 1) the first 5 popular venues explored around the Arbutus-Ridge neighborhood and 2) the total number of popular venues explored around the neighborhoods in Vancouver CSD.

The first 5 popular venues explored around the Arbutus-Ridge neighborhood.
The total number of popular venues explored around the neighborhoods in Vancouver city.

With the data above, we are able to identify all types of restaurants located in the Vancouver city area. The first 10 types of restaurants in Vancouver city are shown below follows alphabetical order:

Types of restaurants in Vancouver CSD

After going through the 43 types of restaurants in Vancouver, it is understandable that Cantonese Restaurant, Chinese Restaurant, and lastly Dim Sums Restaurant can be grouped together as Chinese Restaurant. Therefore, the mean frequency of Chinese restaurants is tabulated according to different neighborhoods shown in the table below.

Mean frequency of Chinese restaurant

5) Data Visualization :

Data visualization is a graphical representation of information and data. By using chart or graphs, data visualization enable the user to easily understand the trends and patterns of the data and the information behinds.

5.1) Bar charts :

Let visualize the independent variables.

The top 5 neighborhoods which have the highest mean frequency of Chinese Restaurants are :

1) Sunset; 2) Renfrew-Collingwood; 3) Victoria-Fraserview; 4) Kerrisdale; 5) Kensington-Cedar Cottage

These neighborhoods have higher competition for Chinese restaurants. Thus, these neighborhoods will have lower priority when we are deciding the neighborhood to establish a new Chinese restaurant.

The neighborhoods that have the highest population are :

1) Downtown; 2) Renfrew-Collingwood; 3) Kensington-Cedar Cottage; 4) West End; 5) Kitsilano

These neighborhoods having high population which means it will bring in more business for a Restaurant.

The top 5 neighborhoods that have a high percentage of Chinese people are :

1) Oakridge; 2) Victoria-Fraserview; 3) Kerrisdale; 4) Arbutus-Ridge; 5) Marpole

These neighborhoods have a high percentage of Chinese people staying. And it means that the investor might get more business since Chinese people will usually prefer Chinese foods more.

The top 5 neighborhoods which have the highest mean frequency of Chinese Restaurants are :

1) Shaughnessy; 2) Dunbar-Southlands; 3) West Point Grey; 4) Riley Park; 5) South Cambie

The people who stay in the above neighborhoods have higher household incomes. The people in these neighborhoods will have higher spending power.

5.2) Map visualization using Folium package :

By using the Folium package, the Vancouver map can be generated with the latitude and longitude of Vancouver city (49.26038, -123.11336). The map below showing all the 22 neighborhoods in Vancouver CSD.

6) Clustering Vancouver Neighborhoods :

6.1) Feature Scaling for independent variables :

First of all, the independent variables need to be defined. The independent variables (features) used in the clustering model are: 1) Total population of each neighborhood; 2) Percentage of Chinese People in each neighborhood; 3) Median household income for different neighborhood; 4) Mean frequency of Chinese Restaurant in each neighborhood. Next, these independent variables need to be undergone feature scaling (a.k.a standardization). Feature scaling is used to standardize all the different independent variables to a particular range and it is crucial to the clustering algorithm, K-means used later. With feature scaling, the K-mean algorithm will not be biased toward any one feature.

All features after feature scaling

6.2) K-means clustering algorithm :

In this project, K-means algorithm is selected as the unsupervised machining learning algorithm as it is quite a simple algorithm to be implemented. One of the disadvantages of using K-means algorithm is that the number of clusters, K value needs to be chosen correctly in order to get a good result.

In order to find the good K value, within-cluster sum of squares (WCSS) will need to be minimized. In this case, the elbow method is applied. The best result will be the lower number of clusters, K with the lower WCSS. Therefore, K-means algorithm is applied to standardized features shown in part 6.1 with different numbers of clusters, K. A graph can be plotted to show how the WCSS changes with the number of clusters, K.

The elbow method to find the best number of clusters

From the graph plotted, it can be clearly seen that the best K value, the number of clusters is 5. It is because the WCSS dropped dramatically from K value = 1 to K value = 5 and the drop is reduced significantly after K value = 5. Therefore, K value = 5 is the number of clusters to be chosen to be used with K-means algorithm.

7) Result :

The standardized features undergone K-means algorithm with K-value = 5 and the 22 neighborhoods in Vancouver CSD will be separated into 5 cluster groups. These cluster labels are then inserted into the data frame shown below.

The data frame after included the cluster label for each neighborhood.

By using Folium, a map showing Vancouver city can be generated with different cluster indicated by different colors.

Folium generated map showing different neighborhoods belong to different clusters.

7.1) Clusters analysis :

Each cluster is being analyzed for the 4 features: total population and percentage of the Chinese population, median household income, and lastly mean frequency of Chinese restaurant. The median household income can be interpreted as the spending power of a household while the mean frequency of Chinese Restaurants represents the market competition that the investor is going to face.

Cluster 0: Low Chinese population, high household income, and low mean frequency of Chinese restaurants.

Cluster 1: Medium Chinese population, medium household income, and medium mean frequency of Chinese restaurants.

Cluster 2: Low Chinese population, medium household income, and low mean frequency of Chinese restaurants.

Cluster 3: Low Chinese population, low household income, and low mean frequency of Chinese restaurants.

Cluster 4: High Chinese population, medium household income, and high mean frequency of Chinese restaurants.

7.2) Results’ Summary :

The whole project is to help investors in finding the optimum neighborhood for establishing a new Chinese restaurant in Vancouver city. With the unsupervised machine learning algorithm used, the K-means model, the neighborhoods of Vancouver CSD are grouped into 5 cluster groups. The analysis above is summarized in a table.

The cluster analysis summary table

8) Discussion :

It is assumed that Chinese people will prefer Chinese food, thus they will go to Chinese restaurants more often. This means that the higher the Chinese population in the neighborhood, there will be more business for a Chinese Restaurant. Besides, the household income decides the spending power of a household. And lastly, the mean frequency of Chinese restaurant represents the market competition.

From the summary table above, the neighborhoods which have a low Chinese population can be ignored as it means there will be lesser business if the investor opened a Chinese restaurant in those neighborhoods. Then, the neighborhood with high and medium spending power should be prioritized. And lastly, the investor will focus on the neighborhoods with low market competition. However, the investor should also consider that lesser market competition might be due to lesser demand.

9) Conclusion :

As a result, neighborhoods in cluster group 1 are to be prioritized by the investor who is going to establish a new Chinese restaurant in Vancouver CSD. These neighborhoods are Arbutus-Ridge, Hastings-Sunrise, Kerrisdale, Killarney, Marpole, Oakridge, and Victoria-Fraserview. Establishing a new Chinese restaurant within these neighborhoods should bring in more income to the Chinese restaurant.

Besides cluster group 1, the investor might also consider the neighborhoods in cluster group 4 for opening a new Chinese restaurant. These neighborhoods are Kensington-Cedar Cottage, Renfrew-Collingwood, and Sunset. Although there are more Chinese restaurants in these neighborhoods of cluster group 4, the Chinese population there is higher as well. The higher Chinese population in these neighborhoods will increase the Chinese cuisines demand. Thus, establishing a new Chinese restaurant in neighborhoods of cluster group 4 can be one of the considerations for investors.

This information can be compiled into a table below. The priority means the priority of the neighborhoods for the investor to consider when the investor is establishing a new Chinese restaurant in Vancouver CSD.

10) Reference :

I) My Github repository for this project:- https://github.com/houng87/coursera-capstone-project

II) Census local area profiles 2016 of Vancouver CSD:- https://opendata.vancouver.ca/explore/dataset/census-local-area-profiles-2016/information/

III) Foursquare API:- https://developer.foursquare.com/

--

--

Yan Houng
The Startup

Data Science aspirant who started in this field in 2020.