Shaping Your Application Tracking Layer — An Iterative Approach

7 min readMay 8, 2020

In this article we are going to talk about the tracking topic, which is not frequently highlighted at conferences or in articles.

If implemented wrongly, it may kill your application architecture: you may need different data in different isolated places of your application, which can lead to data being passed in many places.

This article will focus on customer facing applications and propose another way of how to organise the tracking layer of your application.

We will show the iterations to this problem we went through at Joyn. Before we start, this article expects the reader to have some experience at reading code and understanding some basic concepts of software architecture.

Code examples are provided in Javascript.

Challenges

One of our main challenges were constantly changing requirements, which is to be expected. At the beginning of the project we didn’t have all of the business/data requirements yet.

Our approach was to rely on our previous experience in order to establish a universal and simple foundation at first and then, once the requirements were concrete, to iterate and improve our solutions step by step towards satisfying the business requirements.

Low level API

Let’s get started with first taking a quick glimpse at the low level API we had in place. We use Segment, their SDK already provides some basic functions to perform different tracking calls. This API includes following methods:

action - method to track user interactions such as button-clicked or playback-started.
page - method to track page views.
identify - method to set up user traits such as id, email etc.

More information about these function can be checked here.

With these methods, we should be able to handle all use cases. However, if you use these functions in many places in your code (e.g. onClick handlers on <a> tags), you will face a lot of code repetition.

For us, the most repetitive data was video-assets related information such as asset_id, name, genre, channel and other properties (for example in case an asset is of type episode or a live stream).

Repeating these properties every time won’t be reliable and could lead to bugs and increased maintenance cost.

Tracking through redux

Our first shot to find a central place where to put tracking was a custom redux middleware. This approach allowed us to hook into all application actions, and enables us to easily remove/replace it in the future.

A very basic example of a working middleware to track LOCATION_CHANGE events when a user is navigating between different pages:

const trackingMiddleware = (store) => (next) => (action) => {  if (action.type === 'doA') {
    trackAction({
      data: mapData(action.payload),
    });
  }  if (action.type === 'LOCATION_CHANGE') {
    trackPage(action.payload.name);
  }  return next(action);
};

This concept works if you want to intercept actions and do tracking calls in response to the actions.

However, requirements changed and it became clear that we don’t have enough application actions to track what’s happening with the user.

A quick solution to this problem would be to add a new action to be dispatched whenever you need to track something. So we started to introduce these pieces of code here and there:

// action creator
const trackAction = (actionData) => ({
  type: TRACK_ACTION,
  payload: actionData,
});

const mapDispatchToProps = (dispatch) => ({
  trackClick: (name, trackData) => {
    dispatch(
      trackAction({
        name,
        data: trackData,
      }),
    );
  },
});

Quickly, we realised that there is a lot of boilerplate code to only make one track call. That was the point when we had to reconsider the solution.

Tracking with HOC

One commonality between custom track calls was that each of them is wrapped with connect (Redux HOC) and also we had to import trackAction action creator.

The next interaction for us was to extract such code patterns and create a Higher Order Component (HOC) in order to have less boilerplate code inside of our components.

So we created a withTrack HOC, which was providing 2 additional props to the wrapped component - trackAction and trackView. This way whenever our components need to track something we simply wrap a component with withTrack HOC and receive 2 new props which allowed us to reduce the amounts of boilerplate imports.

Our components started to look like this:

class LoginForm extends React.Component {
    ....
    componentDidMount() {
        this.props.trackView('login_form');
    }

    handleSignupClick = () => {
        this.props.trackAction({
            buttonName: 'sign_up'
        });
    }
    ....
}

export default withTrack(LoginForm);

It’s already looking better and seemed to be more reusable, so we stuck with this approach for some time.

Tracking subsequent events

Our working tracking solution didn’t last very long — new requests arrived from our data scientists 🎉.

We were asked to include some data from original tracking events into their subsequent events.

This should not be a problem when multiple track calls are done from a single page or component, but what if those are in very different places? It essentially mean that we have to put the tracked data into some storage (in our case it could be redux store).

But putting data to the store also creates an additional complexity of invalidating tracked data. The resolution of this problem led to a discussion about the improvement of the whole tracking concept we had in our app. The main topics for improvement we had were:

Deprecating redux which was used purely for tracking and having simpler chain
Finding another way how we can get the required data (e.g. asset meta data) for tracking besides passing it directly from components
Allow subsequent tracking calls to access from the previous track calls

Seems to be a lot of work, so let’s attack the points one by one!

1. Switch from Redux

This goal is relatively easy: We simply needed to create another abstraction that we could import inside of our components and use its methods to send tracking calls.

So we ended up having just 2 functions: trackPage and trackAction, which are directly imported from our tracking module. Under the hood each call of these functions was putting an event to an asynchronous queue, so it was non-blocking and application could stay responsive to the user. At the same time, the queue is an ordered data structure so we could later process events in the order they arrived.

2. Populate tracking data not from components

In the second point we had to find a way how to populate our track calls with the required data. In our project we use Apollo-Client to fetch the data which provides caching functionality out-of-the-box. We came up with the idea to implement custom resolvers, so that we can specify only the resolveName and parameters in our components (i.e. id or type, depending on the resolver's needs).

Afterwards this record will get processed and required data will be automagically attached to the actual HTTP track call.

Let’s have a look at the code example how this structure looks, here is a regular call of the trackAction function:

trackAction({
  name: 'playback-started',
  resolvers: [
    {
      name: 'asset',
      args: {
        assetId: 1,
        assetType: 'Movie',
      },
    },
  ],
});

This is how your data will look after the resolvers gets processed:

{
  name: 'playback-started',
  data: {
    asset: {
      id: 1,
      __typename: 'Movie',
      title: 'Mission: Impossible',
      duration: 6600,
      releaseDate: '1996-08-07T12:00:00.000Z',
      director: {
        name: 'Brian De Palma',
      },
    },
  },
};

As you see a new property asset is added to the action data, this property name is based on the resolver name from the trackAction call. This approach brought us a significant improvement so we don't need to get the asset data in our components and need only to specify the assetId and assetType inside the resolver arguments. While resolving the asset, we use cache-first policy, so if the cache has enough data Apollo-Client will get it from there and if not then will send a request to the GraphQL API.

3. Subsequent events

In order to allow subsequent events to have access to previous data, we implemented a mechanism where it’s possible to search for events by their name. This enables events to access other events by name and retrieve data from them.

As an example, we had 2 events: playback-requested and playback-started, where playback-started needed some data from playback-requested. The solution for us was a very simple dictionary/Map, using the event name as the key and the tracked data as the value.

Besides the simplicity of this approach, there is the additional advantage of avoiding duplication of events. Every playback-requested will overwrite previous events (as the last one will overwrite the previous one).

Another advantage of this approach is that the size of the object remains rather small — instead of adding new events over time, we just overwrite existing ones, keeping the size almost constant.

Where did we get

In summary, let’s have a high level look on the structure we created in order to better understand all connections between the dots.

Below is a diagram showing the end-to-end flow of each tracking event:

The steps for each track call were:

Put the track data in the queue.
The queue processing logic:
If queue is empty, wait for next job.
If queue is not processing any event, wait until the current event loop task is done and start the next job processing.
If queue is processing an event, wait until the current processing of the event is done.
Once the event gets processed, check if this type of event depends on the data from the previous event. If so, then enrich data of the current event with data from the previous one.
Process resolvers in order to enrich the track data.
Structure the result according to the tracking service’s needs.
Check if the current request needs to be stored in a history object.
Send data to tracking service.
Signal that the queue can now process the next element.

In a nutshell, these are all the steps which are taken by each event before being sent to the tracking module.

Conclusion

One of the main learnings we made is to always reevaluate and challenge the current approach even though it may look already good enough.

It’s better not to try to solve an easy problem with a complex approach from the very beginning. Very soon, the requirements will look completely different, so a lot of efforts would be just thrown away. Hence, start simple, once you face new requirements evaluate and iterate.

This article is not a silver bullet for tracking, rather a perspective on how some product/data cases can be solved and improved iteratively.

P.S.

Enjoyed reading it? Do you want to take part in similar challenges and shape the future of video on demand? Come and Joyn us — https://careers.joyn.de/