Exploring the Neighborhoods in Toronto, Canada.

Siddhraj Maramwar
Analytics Vidhya
Published in
5 min readJun 19, 2020

Introduction

As a part of the IBM Data Science professional program Capstone Project, I worked on the real datasets to get an experience of what a data scientist goes through in real life. Main objectives of this project were to define a business problem, look for data in the web and, use Foursquare location data to compare different neighborhoods of Toronto to figure out which neighborhood is suitable for starting a new restaurant business. In this project we will follow step by step methods to get the results.

Problem Description

Consider a situation where a person who wants to open a new Indian restaurant. And the person is Indo-Canadian and lives in the most populated city of Canada Toronto. So he has some doubts in his mind whether it is a good idea to open a restaurant. And if it is good idea in which Neighbourhood he should open his new restaurant, such that it should be profitable for him.

Benefits

There are different people who will have benefit of this project.

• Business Person who wants to open a new restaurant in the neighbourhood.

• Indian people who wants to move to the neighbourhoods which has ample Indian restaurants and culture.

• Data Analyst / Data Scientist who analyse the neighbourhood using statistical and Exploratory Data analysis.

Data acquisition

There are different sources from which I have collected the data for different purpose.

1. List of postal Codes for Canada :-

· I have fetched the postal code of the neighborhoods in Canada from Wikipedia.

· Link — https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

2. Geographical Co-ordinates :-

· I used a CVS file which consist latitude and longitude of the neighborhoods in Canada.

· We can use geocoder for the same purpose, but it is not persistent sometime. So I choose the CVS file instead using geocoder.

· Link for CVS — http://cocl.us/Geospatial_data

3. Fetching Details of the venue :

· I used Foursquare API for fetching the details and location of the venues.

· I used venue ratings as a threshold. And finally visualize using Folium.

From Foursquare API (https://developer.foursquare.com/docs),

I retrieved the following for each venue:

a) Name: The name of the venue.

b) Category: The category type as defined by the API.

c) Latitude: The latitude value of the venue.

d) Longitude: The longitude value of the venue.

e) Likes: Likes of the venue, that the user liked the restaurant.

f) Rating: Rating of the venue.

g) Tips: Tips given by the users.

Data Cleaning

Cleaning the Postal Code data

The data frame will consist of three columns: Postal Code, Borough, and Neighborhood

Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

Data Frame of Postal Codes in Canada.
Data Frame of Postal Codes in Canada.

Adding Geographical Co-ordinates

For this I will use a CVS file which consist latitude and longitude of the neighborhoods in Canada.

Link for csv — http://cocl.us/Geospatial_data

Latitudes and Longitude of the Neighborhoods

Now we will work only with boroughs that contain Toronto

Data frame only having boroughs containing Toronto

Indian Restaurants in the Toronto

Now fetch all the Indian restaurants in the Toronto

Code Snippet
Code Snippet
Indian Restaurants in Toronto

Exploratory Analysis

Let’s analyse how many Indian restaurants are present in each Borough.

Code Snippet
fig ~ Number of Indian Restaurants in each borough in Toronto.

Let’s also analyse how many Indian restaurants are present in each Neighborhood.

Code Snippet
fig ~ Number of Indian Restaurants in each neighborhoods in Toronto

Get Ratings, Likes, Tips of the restaurants using Foursquare API

Fetching the Ratings, Likes, Tips of the restaurants using Foursquare API

Code Snippet
Data Frame ~ Ratings, Tips, and Likes of the restaurants

The Average rating

Getting The Average rating of restaurants in particular Neighborhood

Data Frame Average ratings of restaurants

Now we have the list top performing restaurants,

Conclusion

Below are the Best Neighborhoods to open an Indian Restaurant

Christie, High Park, The Junction South, The Annex, North Midtown, Yorkville, Church and Wellesley, Queen’s Park, Ontario Provincial Government, St. James Town, Cabbage town.

Limitations

1. The Results are highly dependent on the ratings of the Restaurants.

2. The Rating Accuracy is highly dependent on Foursquare API.

My GitHub Repository :-

--

--

Siddhraj Maramwar
Analytics Vidhya

Student of Computer Science | Data Science | Machine Learning | Always up for good challenge.