Let’s assume that you have a data set with multiple rows of latitude and longitude coordinates. You want to calculate the distance between each pair of coordinates in consecutive rows. However, you want to make sure that you get the actual distance of the specific route segment that was followed (not as the crow flies).
I have downloaded a sample data set made available on the UCI Machine Learning Repository. The data set consist of few hundred rows of GPS traces collected by participants using their smartphones. For this example, I reduced the data set to only include the first 20 rows:
The solution that I have provided here is written in Python. Why? Python provides several packages for data manipulation that are easy to use and are supported by a large community of contributors.
Let’s consider the problem again. We have a data set with ‘Latitude’ and ‘Longitude’ values. We are aiming to calculate the distance between coordinates in consecutive rows and store the value in a new column called ‘Distance’.
We are not trying to reinvent the wheel here. So, the easiest way to perform this task is to make use of a mapping web service or API that can do all the grunt work. I recommend using the Google Maps Distance Matrix API for this purpose. The Distance Matrix API is unfortunately NOT free. Please consider the billing structure before using the service.
If you do not have a Google Maps API key yet, check out the link below to setup a project and get your API key:
The Python Script
Import the necessary packages:
- pandas — data analysis tool that helps us to manipulate data; used to create a data frame with columns.
- googlemaps — API for distance matrix calculations.
- itertools — helps to iterate through rows in the data set.
import pandas as pd
from itertools import tee
2. Load CSV file and store in a data frame.
In this step, we load the CSV file into a data frame.
#Read CSV file into data frame named 'df'
#change seperator (sep e.g. ',') type if necessary
df = pd.read_csv('go_track_trackspoints.csv')
3. Set up the API key
Use your Google Maps API key to connect to the Google maps API
#Perform request to use the Google Maps API web service
API_key = 'AIzaSyCiF10X_example'#enter Google Maps API key
gmaps = googlemaps.Client(key=API_key)
4. Iterating Rows in Data Frame
4.1 Pairwise Function
Since the CSV file is already loaded into the data frame, we can loop through the latitude and longitude values of each row using a function I initialized as Pairwise. The pairwise function is using a helper function from the itertools package, called tee. This will make things easier to iterate through consecutive rows concurrently.
4.2 Empty List
An empty list item is created to store the calculated distances. Note that the first value in the list is zero. This is important in order to maintain the same index length as the data frame when we assign the list to a new ‘Distance’ column in the data frame.
4.3 For Loop
The For Loop is used in conjunction with the Pairwise helper function to iterate through the rows. For each iteration in the loop, latitude and longitude values are stored as pairs in the ‘origin’ and ‘destination’ variables, respectively.
4.4 Calling the API
Now that the origin and destination coordinates are known, we can pass these values as parameters into the distance_matrix function to execute the API calls. This step might take up to a minute or two depending on the amount of rows, so a little patience is required.
You can indicate the transport mode that you wish to use. Available modes are: ‘driving”, “walking”, “transit” or “bicycling”. In this example, the API will return the results in meters for ‘walking’ mode.
WARNING: Here is the part where you need to be careful. If you wish to calculate distance using other transport modes, you need to consider the accuracy of your coordinates. The API adheres to the rules of the road. This means, that if there is a slight chance that your coordinates are off by a few meters, the API may consider the coordinate in a oncoming traffic lane (especially on highways). The result is that the API will then select the nearest u-turn and calculate the long way around to the next coordinate. I found that the ‘walking’ transport modes are more consistent for coordinates collected in smaller time intervals.
4.5 Append Individual Distance Result to Empty List
The calculated distance result is then appended to the empty list we created in each iteration.
#pairwise function implemented to iterate through two consecutive rows (pairs) in a data framedef pairwise(iterable):
a, b = tee(iterable)
return zip(a, b)
#empty list - will be used to store calculated distances
list = 
# Loop through each row in the data frame using pairwise
for (i1, row1), (i2, row2) in pairwise(df.iterrows()):
#Assign latitude and longitude as origin/departure points
LatOrigin = row1['Latitude']
LongOrigin = row1['Longitude']
origins = (LatOrigin,LongOrigin)
#Assign latitude and longitude from the next row as the destination point
LatDest = row2['Latitude'] # Save value as lat
LongDest = row2['Longitude'] # Save value as lat
destination = (LatDest,LongDest)
#pass origin and destination variables to distance_matrix function# output in meters
result = gmaps.distance_matrix(origins, destination, mode='walking')["rows"]["elements"]["distance"]["value"]
#append result to list
5. Creating a New ‘Distance’ Column in the Data Frame
The list can be appended to the data frame as a column.
#Add column 'Distance' to data frame and assign to list values
df['Distance'] = list
6. Writing the Final Data Frame to CSV
All the data that we are interested in are now stored in the data frame. As a final step, we can write its contents to a CSV file.
df.to_csv('calculated_distances.csv', sep=';', index=None, header= ['id','Latitude','Longitude','track_id','time','distance'])
That’s it. You should have a CSV file with distances between consecutive rows. You can view the full code on my GitHub page.
If you have any suggestions, feel free to comment below or contact me on LinkedIn.