Finding the distance between two lists of geographic coordinates

Dana Lindquist
3 min readFeb 15, 2020

--

in Python

Suppose you have two lists of locations and you want to know the distance between the points on each list. The lists could be schools, distribution centers, family members’ homes, or just about anything. In this post I will show you how to do this in Python.

To illustrate I made a list of 4 cities starting with the letter A (Albuquerque, Ann Arbor, Aspen & Atlanta) and another list of 4 cities starting with the letter B (Baltimore, Bellevue, Berkley & Boston) and I will find the distance in miles from each city in list A to each city in list B.

So we want to find the distance:

Albuquerque to Baltimore
Albuquerque to Bellevue
Albuquerque to Berkley
Albuquerque to Boston

Ann Arbor to Baltimore
Ann Arbor to Bellevue
and onward.

First let’s import some libraries and create the two lists with latitude and longitude for each city

import pandas as pd
import numpy as np
import sklearn.neighbors
# Create two dataframes with city names and lat-long in degreeslocations_A = pd.DataFrame({
'city_A' : ['Atlanta', 'Aspen', 'Albuquerque', 'Ann Arbor'],
'latitude_A': [ 33.75, 39.19, 35.08, 42.28],
'longitude_A': [ -84.39, -106.82, -106.65, -83.74]
})
locations_B = pd.DataFrame({
'city_B': ['Boston', 'Baltimore', 'Berkley', 'Bellevue'],
'latitude_B' : [ 42.36, 39.29, 37.87, 47.61],
'longitude_B': [ -71.06, -76.61, -122.27, -122.20]
})

The analysis requires the latitude and longitude to be in radians so add these columns to the dataframe using np.radians.

# add columns with radians for latitude and longitude
locations_A[['lat_radians_A','long_radians_A']] = (
np.radians(locations_A.loc[:,['latitude_A','longitude_A']])
)
locations_B[['lat_radians_B','long_radians_B']] = (
np.radians(locations_B.loc[:,['latitude_B','longitude_B']])
)

Now the computation can be performed on these two lists. The library sklearn includes a function which takes a list of lat-long combinations in radians and creates a matrix of one list by the other of the distance between these points. This is the workhorse of our calculation. For ease in working with the output we will convert this matrix to a pandas dataframe.

The distance computed here is a haversine distance. This assumes the earth is a true sphere which makes for a relatively fast computation. The sklearn computation assumes the radius of the sphere is 1, so to get the distance in miles we multiply the output of the sklearn computation by 3959 miles, the average radius of the earth. To get the distance in kilometers this number would be 6371 km.

dist = sklearn.neighbors.DistanceMetric.get_metric('haversine')dist_matrix = (dist.pairwise
(locations_A[['lat_radians_A','long_radians_A']],
locations_B[['lat_radians_B','long_radians_B']])*3959
)
# Note that 3959 is the radius of the earth in miles
df_dist_matrix = (
pd.DataFrame(dist_matrix,index=locations_A['city_A'],
columns=locations_B['city_B'])
)

This table displays the output in ‘wide format’. ‘Long format’ of this data is a list of each pair of cities and the distance between them. To do this conversion we unpivot the wide format table.

# Unpivot this dataframe from wide format to long format.
# When you unpivot, the data in the pivot table becomes a
# column named 'value'. Rename this column to 'miles' for clarity.
df_dist_long = (
pd.melt(df_dist_matrix.reset_index(),id_vars='city_A')
)
df_dist_long = df_dist_long.rename(columns={'value':'miles'})

And there you have it. A table with all the combinations of the cities beginning with A and the cities beginning with B and the distance between them.

--

--