Explore and uncover new insights in your Garmin data using a Data Science approach (.gpx files)

Learn how to use data science to uncover insights from your Garmin data beyond the capabilities of the Garmin app

Peder Ward
CodeX
8 min readMay 20, 2023

--

Garmin devices strengthened by the possibilities enabled through data extraction and analysis. Picture by pxfuel.

Introduction

In today’s world, it’s common for active individuals to utilize various technological devices to track their exercise routines. Whether it’s aiming for a certain number of daily steps, preparing for a marathon, or any other area of utilization. These gadgets are not only fun to use, but also generate a wealth of data that can be analyzed from a data science perspective.

One popular brand among fitness enthusiasts is Garmin, which offers a range of wearable devices equipped with GPS and other sensors to track various fitness metrics, including heart rate, distance traveled, calories burned, and more. Although Garmin provides its own app for data visualization, from a data science perspective, I believe there is room for improvement in the platform.

But how can we use all this data to gain insights about your fitness routine and improve our performance? In this article, we’ll explore how you can export the data to your local computer, apply data science techniques, and uncover hidden patterns and trends that can help you achieve your fitness goals. Whether you’re a fitness enthusiast or a data science expert, this article is for you!

Exporting the data

As far as I know there is a few ways to export your Garmin Connect data. In this article I will do two different exports. It is possible to read about the other methods here.

Option 1: Export All Garmin Data Using Account Management Center

To export all Garmin Data using Account Management Center you can follow this steps:

  1. Go to your Garmin Data Management Account
  2. Sign in
  3. Select Export Your Data
  4. Select Request Data Export

An email with a download link will be sent to you in 48 hours. Depending on your needs this might be enough, but the next solution offers additional features like getting GPX tracks.

Option 2: Export All Garmin Data Using GitHub repo garmin-connect-export

To export all of your Garmin activity data, you can use the garmin-connect-export GitHub repository. Using this solution is very simple — with just a single command line, you can download all of your data to your local machine. To do this, you will need to run the following command:

python gcexport.py - count all - username <YOUR_USERNAME>

Note that this script requires Python 3.7 or newer in order to run properly. In my experience, using both options has been helpful to get as much data as possible.

Preprocessing

For this project, there is not much preprocessing required, but you will need to handle both .csv and .json files. Personally, I prefer to convert everything to a DataFrame for further work, if possible. Towards the end of this article, I will explore how to open and utilize the GPX files.

To convert both .csv and .json files to a DataFrame, you can use the following two lines of code:

df_act = pd.read_json('data/activities.json')
df_act = pd.read_csv('data/activities.csv')

After importing the data to the DataFrame, I converted all the time-related columns to datetime format using the following line of code:

df_act['Start Time'] = pd.to_datetime(df_act['Start Time'], utc=True)

Lastly, it is important to note that the datasets may contain some NaN values since all activities are given the same columns in the DataFrame. For example, swimming does not have elevation like hiking does. These NaN values can affect your visualization, so you can either remove the rows that include NaN values or replace them if it makes sense in your visualization. You can use the following codes to remove or replace NaN values:

df_act.dropna(inplace=True)
df_act.fillna(0, inplace=True)

Exploring the data

Garmin provides some useful visualizations in their Garmin Connect app, but I aim to uncover insights that can not easily be visualized through the app. With the vast amount of available data, the only limiting factors for your insights are your own imagination and creativity.

To ensure the privacy of the dataset, I have applied data perturbation techniques, which involve adding random values or modifying some of the existing ones. The objective is to protect sensitive information while still maintaining the original characteristics of the dataset. Since I’ve only owned the Garmin device for a few months, the amount of data available for analysis is limited.

In order to keep this article concise, I will only offer a brief description and plot of the data in this chapter. While the plots are not highly polished, they serve to demonstrate the potential insights that can be gained from this dataset. If you are interested in the code used to create these visualizations, it can be found on my GitHub page.

Activity data

The “activities.csv” file contains a substantial amount of information regarding your activities. Once the file is converted into a DataFrame, you can examine the column names to view various pieces of data.

Index(['Start Time', 'End Time', 'Activity ID', 'Activity Name', 'Description',
'Location Name', 'Time Zone', 'Offset', 'Duration (h:m:s)',
'Elapsed Duration (h:m:s)', 'Moving Duration (h:m:s)',
'Activity Parent', 'Activity Type', 'Event Type', 'Device', 'Gear',
'Privacy', 'File Format', 'Distance (km)', 'Average Speed (km/h)',
'Average Speed (km/h or min/km)', 'Average Moving Speed (km/h)',
'Average Moving Speed (km/h or min/km)', 'Max. Speed (km/h)',
'Max. Speed (km/h or min/km)', 'Elevation Gain (m)',
'Elevation Loss (m)', 'Elevation Min. (m)', 'Elevation Max. (m)',
'Elevation Corrected', 'Begin Latitude (°DD)', 'Begin Longitude (°DD)',
'End Latitude (°DD)', 'End Longitude (°DD)', 'Max. Heart Rate (bpm)',
'Average Heart Rate (bpm)', 'Calories', 'VO2max',
'Aerobic Training Effect', 'Anaerobic Training Effect',
'Avg. Run Cadence', 'Max. Run Cadence', 'Stride Length', 'Steps',
'Avg. Cadence (rpm)', 'Max. Cadence (rpm)', 'Strokes', 'Avg. Temp (°C)',
'Min. Temp (°C)', 'Max. Temp (°C)'], dtype='object')

Before conducting any analysis, I added two more features to the dataset, weekday and week number, using the following code:

df_act['week_number'] = df_act['Start Time'].dt.isocalendar().week
df_act['weekday'] = df_act['Start Time'].dt.day_name()

Using this dataset, we can now create several plots such as histograms, scatter plots, and line plots to visualize the activity data.

The graph displays the number of activities by weekday, revealing an interesting trend where I have never done a workout on a Friday in this subset.
Pie chart showing distribution of activity types in the dataset.
Two subplots displaying average and total duration of each activity type in minutes.
Two subplots displaying average and total distance of each activity type in km.
The average and maximum heart rate are plotted against training duration in minutes for each activity category.
Calories distribution and calories compared with training duration.
Activity start times categorized by activity type.
Heatmaps showing max heart rate and calories burned in week number and weekday
Acute training load the last few months.

As a big fan of Backcountry skiing, I have the most data on this type of activity. Therefore, I will conduct further analysis on Backcountry skiing in particular.

Left: The scatter plot illustrates the duration versus distance for both backcountry and resort skiing. As can be observed, there are some outliers present in both duration and distance. Right: Comparing elevation gain across backcountry skiing locations

Sleep data

The “sleepData.json” files contain recorded sleep data from your Garmin device. You can easily convert this data into a pandas DataFrame by using the following code:

# Load the JSON data from file
with open('data/sleepData.json', 'r') as f:
data = json.load(f)

# Create an empty DataFrame to store the extracted data
sleep_df = pd.DataFrame()

# Loop over each dictionary in the list
for sleep_dict in data:
# Create a pandas Series from the dictionary and add it to the DataFrame
sleep_series = pd.Series(sleep_dict)
sleep_df = sleep_df.append(sleep_series, ignore_index=True)

Before analyzing the data, I also want to add more information to the DataFrame. This can be achieved by using the following code:

sleep_df['calendarDate'] = pd.to_datetime(sleep_df['calendarDate'])
sleep_df['weekday'] = sleep_df['calendarDate'].dt.day_name()
sleep_df['day_type'] = sleep_df['weekday'].apply(lambda x: 'Weekend' if x in ['Saturday', 'Sunday'] else 'Regular day')

It is now possible to explore the sleep data in more details. For example by using the following graphs. Remember this is modified and random data.

Overall sleeping score compared with sleeping duration over time.
Sleep patterns over time: Analysis of Deep, Light, REM, and Awake sleep hours.
Sleep Chart: Comparison of sleep start and end times for regular day and weekends.
Average sleep duration for each weekday categorized on sleep stage.
Heatmaps showing sleeping duration and overall sleeping score based on week number and weekdays.

I can also determine my average sleep and wake-up time using these few lines of code:

df_sleep_dur['timestamp_start'] = pd.to_datetime(df_sleep_dur['start_time'], infer_datetime_format=True)
df_sleep_dur['timestamp_end'] = pd.to_datetime(df_sleep_dur['end_time'], infer_datetime_format=True)

print(f"Average start sleeping time: {df_sleep_dur['timestamp_start'].mean().time()}")
print(f"Average wake up time: {df_sleep_dur['timestamp_end'].mean().time()}")

Average start sleeping time: 23:14:38
Average wake up time: 07:18:56

Exploring .gpx files

By selecting option two to import Garmin data, you gain access to .gpx files. The .gpx file is an XML schema designed to serve as a common GPS data format for software applications. You can convert this .gpx file to a DataFrame by using the following code:

gpx_name = "act_data/activity_" + str(act) + ".gpx"
with open(gpx_name, 'r') as gpx_file:
gpx = gpxpy.parse(gpx_file)

route_info = []

for track in gpx.tracks:
for segment in track.segments:
for point in segment.points:
route_info.append({
'latitude': point.latitude,
'longitude': point.longitude,
'elevation': point.elevation,
'time':point.time
})
df = pd.DataFrame(route_info)

Folium is a great Python library to use while working with GPS data. Folium is a library that uses JavaScript leaflet.js module in the background and it enables interactive map visualizations in Python. With the data from .gpx file I can for example plot the start location of every activity:

Start location of every activity.

In the example below, I have mapped the route to the top of a mountain that I have ascended multiple times. Multiple tracks are displayed with different colors.

Multiple tracks displayed in a Folium map with different colors

Since we have the GPS points, including the elevation of each point, we can also plot the elevation on each activity. In the following example, I have plotted the elevation over time for my backcountry skiing activities:

Elevation over time on back country skiing activities.

Conclusion

In conclusion, I have shown that there is a ton of information to be extracted from the data generated by Garmin devices. While the Garmin app provides some basic visualizations, this article demonstrated that with the right tools and techniques, it’s possible to go much further and gain deeper insights into your fitness and activity patterns.

It’s worth noting that the examples in this article is just one example of how you could approach data analysis for Garmin data, and the possibilities are really only limited by your creativity. So if you’re a data enthusiast, I encourage you to dive in and start exploring your own data — who knows what kind of insights you might uncover!

Let me know in the comments below if you find any interesting insights or create any cool visualizations using the data from your Garmin device.

LinkedIn Profile — Peder Ward

GitHub repo — Garmin Analytics Project

--

--

Peder Ward
CodeX

Data Scientist, MSc Cybernetics and robotics. LinkedIn Peder Ward