Exploring the Neighborhoods in Toronto, Canada.
Introduction
As a part of the IBM Data Science professional program Capstone Project, I worked on the real datasets to get an experience of what a data scientist goes through in real life. Main objectives of this project were to define a business problem, look for data in the web and, use Foursquare location data to compare different neighborhoods of Toronto to figure out which neighborhood is suitable for starting a new restaurant business. In this project we will follow step by step methods to get the results.
Problem Description
Consider a situation where a person who wants to open a new Indian restaurant. And the person is Indo-Canadian and lives in the most populated city of Canada Toronto. So he has some doubts in his mind whether it is a good idea to open a restaurant. And if it is good idea in which Neighbourhood he should open his new restaurant, such that it should be profitable for him.
Benefits
There are different people who will have benefit of this project.
• Business Person who wants to open a new restaurant in the neighbourhood.
• Indian people who wants to move to the neighbourhoods which has ample Indian restaurants and culture.
• Data Analyst / Data Scientist who analyse the neighbourhood using statistical and Exploratory Data analysis.
Data acquisition
There are different sources from which I have collected the data for different purpose.
1. List of postal Codes for Canada :-
· I have fetched the postal code of the neighborhoods in Canada from Wikipedia.
· Link — https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
2. Geographical Co-ordinates :-
· I used a CVS file which consist latitude and longitude of the neighborhoods in Canada.
· We can use geocoder for the same purpose, but it is not persistent sometime. So I choose the CVS file instead using geocoder.
· Link for CVS — http://cocl.us/Geospatial_data
3. Fetching Details of the venue :
· I used Foursquare API for fetching the details and location of the venues.
· I used venue ratings as a threshold. And finally visualize using Folium.
From Foursquare API (https://developer.foursquare.com/docs),
I retrieved the following for each venue:
a) Name: The name of the venue.
b) Category: The category type as defined by the API.
c) Latitude: The latitude value of the venue.
d) Longitude: The longitude value of the venue.
e) Likes: Likes of the venue, that the user liked the restaurant.
f) Rating: Rating of the venue.
g) Tips: Tips given by the users.
Data Cleaning
Cleaning the Postal Code data
The data frame will consist of three columns: Postal Code, Borough, and Neighborhood
Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
Adding Geographical Co-ordinates
For this I will use a CVS file which consist latitude and longitude of the neighborhoods in Canada.
Link for csv — http://cocl.us/Geospatial_data
Now we will work only with boroughs that contain Toronto
Indian Restaurants in the Toronto
Now fetch all the Indian restaurants in the Toronto
Exploratory Analysis
Let’s analyse how many Indian restaurants are present in each Borough.
Let’s also analyse how many Indian restaurants are present in each Neighborhood.
Get Ratings, Likes, Tips of the restaurants using Foursquare API
Fetching the Ratings, Likes, Tips of the restaurants using Foursquare API
The Average rating
Getting The Average rating of restaurants in particular Neighborhood
Now we have the list top performing restaurants,
Conclusion
Below are the Best Neighborhoods to open an Indian Restaurant
Christie, High Park, The Junction South, The Annex, North Midtown, Yorkville, Church and Wellesley, Queen’s Park, Ontario Provincial Government, St. James Town, Cabbage town.
Limitations
1. The Results are highly dependent on the ratings of the Restaurants.
2. The Rating Accuracy is highly dependent on Foursquare API.
My GitHub Repository :-