Explore your Strava data with GCP

Published in

Renault Digital

7 min readMay 30, 2020

You probably heard before that data is everywhere, that we all produce massive amount of data and that some companies are making fortunes with what we offer to them …but what about taking back control?

The idea here is obviously not to bring our GAFA friends and other BATX down, but simply to demonstrate that data exploration solutions are becoming more and more accessible.

To support this demonstration I will use:
- One of my passion: cycling

- Strava

Like millions of people I share my data with Strava. A social network for athletes who are keen to keep track of their activities, compare them with professional athletes, draw their next ride on a map… To give you an idea of Strava success, just 2 figures: 20 activities are uploaded every second and 1 million athletes join every 30 days.

- Google Cloud Platform (GCP)

A public cloud solution coming with many user friendly services, to store, process, visualize, analyse data… and lot more.

Now that context is set, let’s start!

Source of data: Strava API

Good thing with Strava is that it’s coming with some quite well documented APIs.

Let’s say that you want to fetch details of all your activities, here is the API you should use.

If you never played with an API before, you may have a look at how to call one with Postman. At the same time you will get familiar with Token authentification.

Getting Started With The Strava API: A Tutorial

Here’s a quick and dirty getting started guide for making your first request against the Strava API. This is meant to…

medium.com

Data platform: GCP

GCP is accessible at this url : https://console.cloud.google.com/
As you will notice at the top of the page, Google is offering you $300 credit. That will be more than enough for our exercice 😅.

What we want to achieve is to:
- fetch regularly and automatically Strava data through API
- store that data
- build some nice graphs to analyse the data

There are several ways on GCP to meet those goals. I will present here one I considered as quite light, fully automated, cost efficient …and allowing me to play with multiple GCP services.

Let’s now go through this solution step by step

Strava data acquisition

There are multiple ways for your data to arrive in Strava:

using Strava mobile app to record your activities
setting your GPS in sync with your Strava app
uploading activity files from Strava website

Strava is documented enough for not going more in details here.

GCP process triggering

As it will be described in next section, Strava API call will be done via Cloud Function service. Thing is that Cloud Function does not come with its own scheduling capability, but a Function can be triggered by several elements: HTTP, PubSub, Storage… For this example I’ve then used Cloud Scheduler to push a dummy message in a PubSub topic which is then triggering the Function …yes, it may sound a bit overkill, but finally quite light to implement.

When creating your Cloud Function you indicate Pub/Sub as trigger and you select your topic. If not having one yet, you can create it from drop down menu.

In Cloud Scheduler it’s then a matter of few clicks to create a job that will push a dummy message in topic previously declared.

Just note that Scheduler job frequency is specified in cron format.

Fetching Strava data on GCP

Cloud Function is a serverless way to run code. Probably one of the easiest way to execute a triggered piece of code over the cloud.
For this demo, code was produced using python. Thanks by the way to Alexandre Crayssac for the assistance.

Here is how code is structured:

part 1: lib import and var setting

import sys
import json
import gzip
import logging
import requests
import os
from google.cloud import bigquery
from google.cloud import storageCLIENTID = '12345'
CLIENTSECRET = '123456789012345678901234567890'
GCSSTORAGE = 'strava-storage'logging.basicConfig(stream=sys.stdout, level=logging.INFO)

CLIENTID and CLIENTSECRET will be found in “My API Application” section of your strava profile.

part 2: fetch authentication token function

def fetch_strava_access_token():logging.info('Fetching Strava Refresh Token from GCS...')
    client = storage.Client()
    bucket = client.get_bucket(GCSSTORAGE)
    strava_refresh_token_file = 'strava_refresh_token.txt'
    token_file = bucket.get_blob(strava_refresh_token_file)
    REFRESHTOKEN = token_file.download_as_string()
    
logging.info('Fetching Strava Access Token ...')
    resp = requests.post(
            'https://www.strava.com/api/v3/oauth/token',
            params={f'client_id': {CLIENTID}, 'client_secret': {CLIENTSECRET}, 'grant_type': 'refresh_token', 'refresh_token': {REFRESHTOKEN}}
        )
    response = resp.json()
    
logging.info('Pushing Strava Refresh Token to GCS...')   
    token_file.upload_from_string(response['refresh_token'])
    return response['access_token']

As you can see, Google Cloud Storage (GCS) is used to store Strava refresh token. This token allows to automatically renew access rights even if previous access token expired.

part 3: strava API call function

def fetch_strava_activities(STRAVA_ACCESS_TOKEN):
    page, activities = 1, []logging.info('Fetching Strava Activities ...')
    while True:
        logging.info(f'Fetching page #{page} ...')
        resp = requests.get(
            'https://www.strava.com/api/v3/athlete/activities',
            headers={'Authorization': f'Bearer {STRAVA_ACCESS_TOKEN}'},
            params={'page': page, 'per_page': 200}
        )
        data = resp.json()
        activities += data
        if len(data) < 200:
            break
        page += 1 
        
    logging.info(f'Fetched {len(activities)} activites')
    return activities

Activities can be fetched by pages of maximum 200 activities, hence the loop until less than 200 activities are returned.

part4: push activities into BigQuery dataset function

def activites_to_bq(jsonl_lines, project, dataset, table):
    bq_client = bigquery.Client()
    job_config = bigquery.job.LoadJobConfig()job_config.source_format = bigquery.job.SourceFormat.NEWLINE_DELIMITED_JSON
    job_config.write_disposition = bigquery.job.WriteDisposition.WRITE_TRUNCATE # Overwrite
    job_config.create_disposition = bigquery.job.CreateDisposition.CREATE_IF_NEEDED
    job_config.autodetect = Truelogging.info(f'Starting import to {project}.{dataset}.{table} ...')
    job = bq_client.load_table_from_json(
        json_rows=jsonl_lines,
        destination=f'{project}.{dataset}.{table}',
        job_config=job_config
    )logging.info(f'Launched job id: {job.job_id}')
    return job.job_id

part 5: call part 2 / 3/ 4 functions

def run(data, context=None):
    STRAVA_ACCESS_TOKEN = fetch_strava_access_token()
    activities = fetch_strava_activities(STRAVA_ACCESS_TOKEN)
    activites_to_bq(activities,'my-project', 'my-dataset', 'strava_activities')

In addition, those 2 requirements should be declared in REQUIREMENTS.txt

google-cloud-bigquery
google-cloud-storage

Once your cloud function deployed and its triggering set, this is it. Your activities will be automatically loaded into you bigquery table.

Display and analyse data

Those are below a few examples among many of how data can be displayed and analyzed.

Google is proposing a simple and quite user friendly dataviz solution: Data Studio. It does not necesseraly allow very complex visualisation as some of the solutions such as Qlikview or Tableau can, but it covers more than the basics. Main advantage is obviously that it is making the most of BigQuery database.
Once your datastudio report linked to your strava activities table, you can easily produce graphs summarizing activities over a period: distance, elevation climbed, average speed trend, activities location…

If you want to go deeper with geolocalisation analysis, you may want to try bigquery geoviz online tool: https://bigquerygeoviz.appspot.com/

Activity API is returning you a special polyline field which is hosting the trace of each of your activities. Once those informations stored within an .html file as below, you can get all your activities displayed on OpenMap (more on this solution on Mark Needham’s blog).

<html>
  <head>
    <title>Strava Activities</title>
  </head>
<body>
    <script src="http://cdn.leafletjs.com/leaflet-0.7/leaflet.js"></script>
    <script type="text/javascript" src="https://rawgit.com/jieter/Leaflet.encoded/master/Polyline.encoded.js"></script>
    <link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.7/leaflet.css" />
    <div id="map" style="width: 100%; height: 100%"></div>
<script>
    var map = L.map('map').setView([48.817862, 2.218328], 13);
    L.tileLayer(
        'http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
            maxZoom: 15,
        }).addTo(map);
var encodedRoutes = [
"<polyline_value_1",
"<polyline_value_2",
"<polyline_value_3"
]
for (let encoded of encodedRoutes) {
      var coordinates = L.Polyline.fromEncoded(encoded).getLatLngs();
L.polyline(
          coordinates,
          {
              color: 'blue',
              weight: 8,
              opacity: 1,
              lineJoin: 'round'
          }
      ).addTo(map);
    }
    </script>
  </body>
</html>

That’s it for the demo. 🙂

If you are after simpler solutions you may have a look at those other blogs presenting how you can directly link datastudio to Strava:

googledatastudio/community-connectors

This Data Studio Community Connector queries Strava activity data for an authenticated user. This connector uses the…

github.com