Primer on GPS Data with Strava and Python

Ryan Richbourg
Analytics Vidhya
Published in
4 min readSep 18, 2020
Photo by Markus Spiske on Unsplash

It’s no surprise that as a data science nerd and cycling fanatic, I am terribly addicted to Strava, a social network app for athletes that doubles as a wonderful data tool.

The Strava app allows you to record your rides and saves interesting data such as your speed, distance, time, elevation, power output, energy consumed, heart rate, weather conditions, etc.

Personally my favorite type of data is the GPS map of where I traveled, because when I look at maps of previous rides I can recall the sights and memories I experienced at various points along the way. Some people have even gone so far as to create “Strava art”.

In order to create the maps for each ride, Strava has to aggregate the GPS (Global Positioning System) data recorded by a phone or cycling computer in GPX format.

GPX stands for GPS Exchange Format, which is a data structure that I compare to a tree with branches and leaves.

Just like a tree has many branches and each branch has many leaves, a GPX file usually contains many “tracks” that have many “segments” which contain many “points”. A point is the basic unit of information we will be looking at in this article.

When Strava creates GPX files, each point contains the latitude, longitude, elevation, and timestamp* of your device. The frequency at which points are updated varies depending on your device and cellular network strength, but is typically fast enough to represent your location over time pretty reasonably.

On Strava’s website you are able to download the GPX file for an activity, so we’re going to take a look at one of my friend’s rides in Fort Worth, Texas.

For these next steps I’m assuming you already have Python installed on your machine.

First we are going to import a Python library called gpxPy to parse the XML-schema tree. To install this library I ran ‘pip install gpxpy’ in my terminal. https://pypi.org/project/gpxpy/

After opening and parsing the GPX file, we get a GPX object containing tracks, segments, and points. We are only interested in using the points data right now, so after indexing it down we’ll end up with a list of tuples containing GPS updates (this example has 7273 points).

Next we simply take the list of tuples and put them in a Pandas dataframe with columns for longitude, latitude, and elevation. Finally we can plot the latitude by longitude with Matplotlib to get our first visual look at the route.

Where things get really interesting is mapping our route over Google Maps using a different Python library called gmplot. The easiest installation lines for your terminal are ‘pip install gmplot’. https://pypi.org/project/gmplot/

With gmplot, you’ll want to set up the center coordinates for your map display, instantiate a new GoogleMapPlotter object, fit it to your data, and draw the map onto an HTML file.

Voila! I hope you enjoyed this short tutorial example on visualizing GPS data.

N.B. My friend used a Garmin cycling computer to record this route instead of his cell phone. The two differences I noticed between my own GPX data and his are that 1) timestamps were not included by his Garmin in the GPX points, and that is probably because 2) his GPX were updated exactly once per second (7273 points for 2:01:13 elapsed time (121.2167 minutes * 60 seconds / 7273 points = 1.00 seconds per GPS point)). This is advantageous compared to GPX updates from my cell phone which are not as precise because they rely on the connection strength between my phone device & cellular network provider. Therefore one way to assess your more realistic average speed on a bicycle is using a cycling computer instead of a cell phone.

--

--