Finding Nearest pair of Latitude and Longitude match using Python

Rahil Ahmed
Analytics Vidhya
Published in
6 min readMay 22, 2020

Using Haversine Distance Equation, Here is a python code to find the closest location match based on distance for any given 2 CSV files which has Latitude and Longitudes

Photo by Andrew Stutesman on Unsplash

BUSINESS NEEDS

Now a days, Its getting very common to deep dive more on the user demographics. Especially in the customer customer centric companies and unicorns like UBER, OYO, GRAB, SWIGGY , In these companies App plays a crucial role in understanding the user behavior. Age, Gender, location , Average Basket Size, Active time period, Life time value are considered as the key drivers to the business.

Importance of Closest Presence (Distance)

Distance from a member to a hotel
Distance from a member to a hotel (Sai Kiran Gottala, UI/UX Designer at Socar)

Companies like UBER , always finds active distance between the user and UBER Car and guides their drivers to drive to the high Demand (surged) areas. In the same way, Food delivery companies like SWIGGY calculates, the distances between user and Hotel Location to maintain their on time Delivery experience and guides the riders accordingly.

In this tutorial, we perform Haversine Distance formula Analysis between hotel(Restaurants) locations and members in Malaysia. The dataset consists of two separate files. The first one is Hotel locations and the second is members. We want to determine which hotel is near to any given member in the second dataset

(It’s not mandatory to understand the mathematical equations behind, we are here to understand the process of replicating the formula on Python)

What’s in Wikipedia about Haversine Distance

The Haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes.

Source: https://en.wikipedia.org/wiki/Haversine_formula

No need to get afraid of this, as we just have to use a single piece of code that replicates the same formula in python . Just only remember that the radios of the earth is 6371 KM

Replicating Haversine Distance fromula in Python :

from math import radians, cos, sin, asin, sqrt
def dist(lat1, long1, lat2, long2):
"""
Replicating the same formula as mentioned in Wiki
"""
# convert decimal degrees to radians
lat1, long1, lat2, long2 = map(radians, [lat1, long1, lat2, long2])
# haversine formula
dlon = long2 - long1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers is 6371
km = 6371* c
return km

That’s it!!

Now it’s time to Open Jupiter Notebook

PART A — To find the closest Hotel name for each member in the CSV

Reading the Data from CSV File

hotels = pd.read_csv(“data/hotels.csv”)
hotels.head()
members = pd.read_csv(“data/members.csv”)
members.head()

Changing the column names, in a way to reduce the number of letters to be typed in the code

# Renaming the column names 
members=members.rename(columns = {'latitude':'lat','longitude':'lon'})
hotels=hotels.rename(columns = {'latitude':'lat','longitude':'lon'})# To make sure that there are no null values and All are either integers/ Float values members.info()
print('\n XXXXXXXXXXXXXXXXXXXXXXX\n')
hotels.info()

Using Haversine’s Equation we can find the “Distance” between a member location and all the coordinates in Hotels data frame.

Now, lets pass on the values to our Haversine Equation, to find the distance

(Just copy paste the above equation)

from math import radians, cos, sin, asin, sqrt
def dist(lat1, long1, lat2, long2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lat1, long1, lat2, long2 = map(radians, [lat1, long1, lat2, long2])
# haversine formula
dlon = long2 - long1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers is 6371
km = 6371* c
return km

The above calculate the distance to all hotels. Using idxmin you can find the closest Hotel name using,

def find_nearest(lat, long):
distances = hotels.apply(
lambda row: dist(lat, long, row['lat'], row['lon']),
axis=1)
return hotels.loc[distances.idxmin(), 'name']

Once we have the closest hotel name, Now just append the results backs to the Original member’s data using,

members['name'] = members.apply(
lambda row: find_nearest(row['lat'], row['lon']),
axis=1)
# To check the data frame if it has a new column of hotel name (for each and every member's location in the list)members.head()

PART B — Finding the “distance” between member location to the closest hotel

Get all the data in a single table, For the above data set, merge the latitude and longitude of the Hotel names

members = pd.merge(members,hotels[['name','lat','lon']],on='name', how='left')# Rename the new columns as both the columns has same name, and python gets confused members=members.rename(columns = {'lat_x':'m_lat','lon_x':'m_lon','lat_y':'h_lat','lon_y':'h_lon'})members.head()

Again to find the distance between two pairs of ordinates, lets use Haversine Distance formula again (on the latest table)

from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers is 6371
km = 6371* c
return km
# Creating a new column to generate the output by passing lat long information to Haversine Equationmembers['distance'] = [haversine(members.m_lon[i],members.m_lat[i],members.h_lon[i],members.h_lat[i]) for i in range(len(members))]
members['distance'] = members['distance'].round(decimals=3)
# Printing the data table
members.head()

The Distance which we got in the last column is the Distance in “Kilo Meters” between Hotel and the member Locations

To print your results to your CSV file

members.to_csv("output.csv", sep='\t', encoding='utf-8'

PART C — Plotting the results on the map

Please break your output.csv into 2 files, One to Members(Id,Lat, Long) and the other to Hotels (name,Lat,Long)

Please click on : https://www.google.com/maps/about/mymaps/ and Create a new Map

Then import your both Members.csv and Hotel.csv . My Maps will help to project the coordinates for you

Conclusion : Using the distance information you can

  1. Define and leverage new expansion opportunities
  2. Manage your on-ground fleet and reduce your operational costs
  3. Manage your SLAs on Operational Efficiencies and can maintain higher NPS Scores in terms of CX.

Finally, Thanks a lot for taking out your valuable time in reading this article, I hope you enjoy replicating the same on your data set.

--

--

Rahil Ahmed
Analytics Vidhya

Business (Analytics+Strategy+Management) at SOCAR | Ex-OYO | IIT Guwahati ✪ https://www.linkedin.com/in/rahilahmed/