Creating our Cross Platform Analytics Pipeline

Published in

Peloton-Engineering

6 min readOct 25, 2018

Measuring user behavior in a growing product

When Peloton started our primary product was our Bike, but our product has grown to support many different platforms including our new Tread and a growing number of Digital apps.

We use Segment’s client side SDKs to help track analytic events and batch them to a Segment Source. From its online dashboard, Segment allows you to configure a list of destinations that each Source will send events to. We define a separate Source for each platform and have two destinations set for every Source: an Amazon Redshift database and an Amplitude project.

Segment was always a great tool to help provide an abstraction around what tools we used for storing and visualizing our data, but each development team was sending a different set of events to their Segment sources.
We needed to measure our user engagement across the Peloton experience as a whole, rather than being limited to viewing a user on a platform-by-platform basis. If a user starts a workout on the bike, we should be able to measure it exactly the same way as if they had taken the workout on iOS.

Consolidating analytics into a cross-platform schema

The first step was to decide which events we wanted to measure across all of our platforms and which events were specific to a user experience on a given platform. We ended up creating a set of our most important metrics that were shared across all platforms. Along with a consistent set of event names, we also agreed upon a group of core properties to attach to every event that would be similar across platforms, as well as any properties (key value pairs) that each individual event should include.

Completed Activation Event with Properties shown in Amplitude

In the image above, the different types of serial numbers along with information about the platform is considered part of our “core properties”, and other properties like “Time to Activation” and “Activation Method” are event-specific properties only available on this event.

Segment provides some nice features to help identify users in your data by exposing an identify function on each of their SDKs. We agreed upon a common set of properties that Segment calls “User Traits” that every platform would attach when they identify the user on login or session start.

This group of core events, properties, and user traits are what we refer to as our analytics Schema.

The challenges with a cross-platform approach

Schema Migration

Our first challenge was planning the transition from our current set of analytic events to our new schema. We weren’t going to make this migration quickly and we needed to continue sending what we called our V1 event schema, while we made the changes to create our cross-platform V2 schema.

Some clients, like our web team, were implementing events for the first time using the new V2 schema. But our hardware teams that support the Bike and Tread already had hundreds of events we were sending. There were two key decisions that we made.

Our analytics code had built up a lot of dust over the years. There were multiple modules that defined their own rules for using Segment, and the events were scattered across the codebase in many different forms. We took this opportunity to provide a single interface that initialized Segment in a consistent way. We also moved all of our events into a consistent package structure and created a separate class for each event where it would define its name and list of properties.
We leveraged Segment to create a new V2 source for each platform. If we already had existing events for that platform, we’d keep the old code in place but add a new call to send our V2 event to the V2 Segment source. This meant that we could test all our new events as we were adding them without worrying about making any changes to the existing ones.

Supporting Future Releases and Schema Changes

Another challenge was determining how we could reduce the amount of churn when updating our core event schema in the future. If we change an event name or property key, we didn’t want to have to release a new update to each of our applications.

To handle this we decided to move as many core events as possible to be sent server-side rather than client-side. This was a good fit since core events should always be shared across multiple platforms, and many of these events were a result of interactions with our API. We came up with general set of rules for which events belonged on the client vs the server.

Server-Side:

The event is being sent as a direct result of interacting with an endpoint
Most of the properties for an event can be retrieved easily by the API
Multiple platforms will be implementing the same event

Client-Side:

No side effects happen in a database as a result of the event
It is only a UI interaction, for example a “Viewed Profile Screen” type of event
Most of the properties are client related and aren’t shared amongst multiple events

Segment also has a very helpful article about when to send events for the client vs the server.

Receiving client context for Server-side events

One case we found difficult was when we wanted to send an event from the API, but also had to send data in the properties that only the client had access to and the API couldn’t fetch. A good example of this is our “Started Workout” event. This event is probably our most important business metric, so it’s important that it is measured accurately and consistently across all platforms. That should make it an ideal fit for sending from the backend, but we also want to gather the context about how the user started their workout. What screen in the app did they select it from — the featured screen, the on-demand screen? Were they browsing classes with a group of filters selected, and if so what filters were they using?

The Peloton-Client-Details header key points to the encoded analytics values

A solution we found for this was to encode any client-side context into the HTTP request headers that were sent to the API. This way the API could parse out the list of values that was sent by a client and attach each key-value pair as properties in addition to any data the API should fetch from the database.

Where We Landed

We successfully launched our new analytics pipeline and we’ve been able see our user’s behavior across all our platforms in a consistent way. We’re using this information to better understand how users engage with our current features and to help us build better cross-platform features for our users in the future.

In addition, our development teams have been enjoying a more consistent format for events, as well as being able to utilize our API to consolidate analytics that would have had to been duplicated across platforms.