How we rebuilt our session recording feature from scratch

Ludwig DUBOS
The AB Tasty Tech & Product Blog
4 min readDec 19, 2019

When I arrived at AB Tasty a few months ago, I was put in charge of our session recording feature and asked to improve it. Indeed, little had been done since the acquisition in 2016 of Nirror, a piece of software that tracks visitor journeys. The tool was integrated into our platform, but our team was having a hard time scaling it. Indeed, the technologies used to build it were too old, and the service was suffering from performance and accessibility problems. My diagnostic was definitive: we had to redo it entirely.

session recording player

Recording user sessions: the basics…

When I started working on this session recording tool, I thought of what it meant in terms of development. To record a user session, we needed to rebuild the web pages the user browses on and all the elements that compose them:

  • DOM
  • Resources (Images, CSS, etc.)
  • Events (clicks, scroll, etc.)

Here comes the first challenge. Today, many websites prevent third parties from exporting their resources from a different domain. We had to find a solution to manipulate the DOM and resources on their own websites without impacting the performance of the pages. All this while keeping in mind the second part of the job: being able to read the video.

To avoid wasting time, we based our work on the OpenSource project Clarity, even though it was only tacking the Front-end part of the problem: how to collect the DOM. Its main advantage: Clarity doesn’t use any library, so it’s very light. Moreover, Clarity is very flexible as it can easily be completed with plugins.

How does it work? The first step consists of recording the DOM of reference in what we call a “Discover” phase. This DOM needs to be serialized without all properties to limit its size. Each node is assigned an identifier so that we can find it later. Then we add what we call a “MutationObserver” on the page, that notifies us in case of any change in the DOM. The integrity of the records is checked on our servers. Finally, to avoid using too much bandwidth on the client-side, we recover the resources on the server-side, sent by the DOM. But not all of them: we record the images, CSS and fonts, leaving aside the javascript, favicons, and videos.

Thus, we built our own tool inspired by Clarity, with some modifications to meet our needs. For instance, we switched to Jest for the unitary tests. But we used the same batch system as Clarity for data transport. As for the back-end (the part that transforms the batch of data into readable videos), different solutions existed, but we opted for Node as I was full-stack in javascript.

A second version of the solution was ready to be tested.

… and the challenges

Of course, it couldn’t be that easy.

We soon realized that the data was going to explode if we didn’t find a way to filter unnecessary information. The first axis to cut back the amount of data collected concerned the recording of web pages. Indeed, websites don’t evolve continuously. If the DOM can change from one user to another, the resources are only modified from time to time. By checking the size of the file and the HASH of the resources on a specific web page, we could tell whether the page had changed or not. We decided to implement an automatic check of the resources: a GET query to retrieve the HEADERS of the page, including the information on the size of the file (the query is closed as soon as we receive the headers). Every 12 hours, we check the entire file by comparing their HASHs. As long as one page hadn’t changed, the session recording tool didn’t need to collect the resources again.

Likewise, continuously recording the users’ behavior on a specific web page would require enormous resources to process and store the data. We decided to record a discrete history of cursor positions, based on the gap between two positions or the time spent in the same area. This also allowed us to detect simple behaviors, such as rage clicks or error clicks.

Checking and selecting the videos

And that’s not all, the videos still needed some filtering. Indeed, when recording the DOM, some messages can get lost or arrive in disorder. We had to create a system of message queuing in order to delay, filter and sort the interesting videos, and make the resulting videos coherent and relevant for the customer. This filtering function also helps to detect uninteresting videos, such as those which are too short (high bounce rate) or too long (abandoned page), or those with little user behavior.

All in all, the biggest challenge of this session recording tool clearly is the quantity of information received and our ability to stock them. Today, we are still working on improving our database and optimizing the performance of our session recording tool. And to decrease our use of bandwidth when collecting the resources of a web page, we are considering moving from the batch system to a web socket, using an open connexion to record the sessions. This would better secure the data transport, and provide information when a web page is closed. The work continues…

--

--