Most Liked Restaurant Neighborhoods in New Delhi

Anushka Mehra — Thu, 30 Apr 2020 21:24:57 GMT

New Delhi the capital city of India, is a hub of food of various cuisines. In this article, I will walk through the steps taken to cluster areas in Delhi with most liked restaurants. In this project, we also use Foursquare API to get information about the venues/ restaurants in these neighborhoods. Foursquare API is also used to fetch number of likes for each venue.

Data Set Used:

The data set used is obtained from Kaggle. The data set contains Neighborhood, Borough, Latitude and Longitude information for Delhi, India. The data can be downloaded from Kaggle at https://www.kaggle.com/shaswatd673/delhi-neighborhood-data.

Lets Code!

Lets dive into the programming part. The entire notebook can be accessed here https://github.com/anushkamehra16/Coursera_Capstone/blob/master/Most%20Liked%20Restaurant%20Cluster%20in%20New%20Delhi.ipynb

We start by importing main libraries like pandas, numpy, folium, KMeans, matplotlib. These are the basic libraries we need to proceed with.

Libraries to be imported

Next is to import the Delhi Neighborhood csv file into a pandas dataframe. This can be done using a single line of code in python.

Reading data set into a pandas data frame

Now that we have the data frame ready, we will inspect the data frame. Extract the shape of the data frame i.e. number of rows and number of columns.

We are interested in only two Boroughs New Delhi and South Delhi. We create a subset to get neighborhoods for these two boroughs.

Subset with Borough New Delhi

Similarly, subset with Borough is South Delhi. Again, check the shape of the data frame. We have 54 rows and 4 columns.

Next, we plot Delhi Map using Folium library. The map created is an interactive map. We can set the initial zoom level for our map. To generate a map, we need latitude and longitude attributes for the location. This can be done using geocode’s Nominatim library where the input is our location name.

Now that we have New Delhi’s latitude and longitude values, we can create a map using Folim.

The subset just created with New Delhi and South Delhi borough, its neighborhood values are plotted on the map.

Using FourSquare:

To access FourSquare API, we need to create a developer account and get Client_ID and Client_Secret and verison information, as it will be used in every API call.

Two API endpoints are used :

Explore endpoint to get venues in neighborhoods

We obtain the results in a json format and transform it into a pandas data frame with only attribute information of our interest. The venueID is used as an input to the likes API endpoint.

2. Likes endpoint to get likes count for each venue.

Likes API extracts number of likes for a venue by using the venueID. We extract likes count for all food venues using this endpoint.

We have the likes count for all food venues and we have their neighborhood information.

Next, we join the data frame containing likes count with delhiVenues data frame containing venue information like Neighborhood, Venue Latitude, longitude, category.

Further, this data frame is grouped on Neighborhood and mean is found for likes count. This new data frame will be used for clustering.

Clustering:

To find the optimal value for K, we use elbow method. In this, Kmeans algorithm is run against a range of values for K. Here, our range is 1 to 10. The value at which the graph changes its shape to flatten out, is usually adopted as the optimal value for K.

Optimal value for K

From the graph above, our optimal value for K is 3. So, using K = 3, a new model is generated to produce 3 clusters.

Fit value K = 3

After running this, we have obtained 3 clusters. These clusters define neighborhoods with most liked venues/ restaurants.

Cluster 2

As seen above, cluster 2 has highly liked neighborhoods. Similarly, we can see values for cluster 1 and 3.

At last, we concatenate Borough column with our cluster data frame.

This way, we have obtained clusters in New Delhi of highly liked places.

Connecting to SQL Database using SQLAlchemy in Python

Anushka Mehra — Mon, 09 Sep 2019 03:34:11 GMT

Python has many libraries to connect to SQL database like pyodbc, MYSQLdb, etc. In this tutorial, I will introduce sqlalchemy, a library that makes it easy to connect to SQL database in python. The syntax for this library is similar to pyodbc. We can follow a few simple steps to connect to our SQL database.

Step 1: Importing libraries

import pyodbc
import sqlalchemy as sal
from sqlalchemy import create_engine
import pandas as pd

Step 2: Establishing connection to the database
# in order to connect, we need server name, database name we want to connect to

engine = sal.create_engine(‘mssql+pyodbc://server_name/database_name?driver=SQL Server?Trusted_Connection=yes’)

server_name : server you want to connect to
database_name : database you want to work with
Trusted_Connection = yes, when using windows authentication. If you have set a separate username and password for your SQL database,
sal.create_engine(‘dialect+driver://username:password@host:port/database’)

# establishing the connection to the databse using engine as an interface
conn = engine.connect()

With this, we have established a connection to our SQL database. Now we can query our database for tables, updating those tables, creating new tables, etc.

Step 3: Extracting table names present in the database

# printing names of the tables present in the database
print(engine.table_names())

Step 4: Extracting table contents

# checking whether the connection was actually established by selecting and displaying contents of table from the database
result = engine.execute(“select * from tablename”)
for row in result:
print (row)
result.close()

We can save our table as a pandas data frame. Another way is to query the database table into a data frame directly.

Reading and saving table to a pandas data frame

# reading a SQL query using pandas
sql_query = pd.read_sql_query(‘SELECT * FROM database_name.dbo.tablename’, engine)

# saving SQL table in a pandas data frame
df = pd.DataFrame(sql_query, columns = [‘column1’,‘column2’,…..])

# printing the dataframe
df

Reading an external file and storing it to a SQL table

We can read a CSV, excel file and store its content to a SQL table.

df = pd.read_csv(‘tablename’)
# create a new table and append data frame values to this table
df.to_sql(‘tablename’, con=engine, if_exists=’append’,index=False,chunksize=1000)

Closing the connection

conn.close()

This is a simple and basic approach to establish a connection to the database and append values to the tables.

Stories by Anushka Mehra on Medium

Most Liked Restaurant Neighborhoods in New Delhi

Connecting to SQL Database using SQLAlchemy in Python