Performance monitoring

Egor Skorobogatov
Zattoo’s Tech Blog
14 min readOct 12, 2021
Photo by Mike Hindle on Unsplash

Performance monitoring is an important metric that allows you to assess how fast your application works, what slow parts it has, and gives an overview on each new development cycle of either improvements or degradation compared to the former versions.

Performance metrics can be used to compare how the app performs on different devices or to measure the performance across different releases.

Automated tests, which run on each pre-release environment, can give enough performance metrics to compare with already collected data across previous releases. In this way, it might raise a flag to stop the app from further steps of going to the live environment, since the version has a performance degradation. Performance must be one of the top points which determine the success of the app. Poor efficiency will be an obstacle for user engagement even if the app has a perfect UI and provides a lot of features.

The metrics we want to use should meet the following criteria:

  • To be relevant to our goals
  • To be collected and measured in a consistent manner
  • To be analyzed in a format that is acceptable by any stakeholder

In this article, I am going to share the outcome of adding performance monitoring to the OTT (“Over-the-top” media service) at Zattoo.

Evaluation of different services

The first idea which comes to mind is using one of the existing services available on the market to track the performance. This solution has its pros and cons and that comparison is beyond the scope of this article. For us this didn’t work due to several reasons:

  • None of them provide sufficient functionality out of the box. It could be lacking good analysis, which would require connecting another tool to query and visualize the data. That could surface another issue that not every service that measures the performance has a connector to the visualization dashboard.
  • We need to have consistent storage with access to the past year’s worth of data or even older. Unfortunately with most of the services, we would have to export the aggregated data and restore it in a new place to be able to compare against it in the future.
  • Most of these solutions that work out of the box are paid, and we took into consideration their price vs the cost of our own development effort.

We made the decision to build our custom solution based on Performance API on the client-side, and on the backend side we took advantage of our existing database and visualization service. Below I would like to concentrate on metrics that would be relevant and meaningful for an OTT application and a technical implementation in the client.

What to measure

A little bit of introduction to what Zattoo OTT service is. It is a web-hosted application running in a webview on the TV or on the STB (Set-top box) which allows a user to stream TV channels or watch VOD (Video on demand) content. When the application starts, the last shown TV channel must be streaming — the video is playing on full screen. However, when the app is opened the first time, it shows a login/signup page with the form, dialog, or some pictures which prompt the user to proceed to the next step.

Long story short — the faster we provide feedback that the page is loading the better user experience we establish. Here is the first relevant metric — first-paint:

  • first-paint — The time between navigation starting and when the browser renders the first pixels to the screen.

It’s the moment when we definitely know that the render has been started. Our further goal will be to have this moment happening as early as possible which brings us to the next event:

  • first-contentful-paint — Time when the browser renders the first bit of content from the DOM, providing the first feedback to the user that the page is actually loading.

This is an important point where we attract the user’s attention for further changes on the screen and can tell him that the desired content is coming soon. At this point, we can already show a loader or placeholders to make this transition smoother.

Where can we get these metrics from? The answer is the PerformancePaintTiming interface of Paint Timing API. Using Performance API we can obtain these metrics. Further below, we will have a closer look at how and in what format to get this information.

Above I slightly touched on the topic of DOM render and the start of the navigation. But what does the navigation mean in this context and what moment in time are these metrics measured against? To better understand this let’s have a look at this image:

This image illustrates timing metrics defined by the PerformanceNavigationTiming interface which provides properties related to the browser’s document navigation events. This interface extends the PerformanceResourceTiming interface, which defines metrics for network events.

  • startTime — returns 0 and marks the start of performance measurements. The property represents the moment after the prompt for unload terminates on the previous document in the same browsing context. Literally, it’s the moment when a user presses Enter after the new URL address was provided in the address bar. In case it is an initial load and there was no previous document, this event is equal to redirectStart.
  • unloadEventStart —this property represents the moment when the browser starts unloading the previous document. This is equal to 0 if there was no previous document.
  • unloadEventEnd — this property represents the moment when the browser finishes unloading the previous document. This is equal to 0 if there was no previous document.
  • redirectStart — this property stores the moment when the browser starts redirecting to the new navigation URL. This is equal to 0 if there was no previous document.
  • redirectEnd — this property stores the moment of the event when the browser finishes redirecting to the new navigation URL. This is equal to 0 if there was no previous document.
  • fetchStart — this property represents a time right before the browser starts fetching the requested resource.
  • domainLookupStart — this property represents the moment right before the browser starts the domain name lookup for the resource.
  • domainLookupEnd — this property represents the moment right after the browser finishes the domain name lookup for the resource.
  • connectStart — this property represents the moment when the browser sends the request to open a connection to the server.
  • connectEnd — this property represents the moment when the browser has established the connection to the server and is ready to request the resource.
  • requestStart — this property represents the moment right before the browser starts requesting the resource
  • responseStart —this property represents the moment right after the browser has started receiving the response.
  • responseEnd — this property represents the moment right after the browser completely received the response.
  • domInteractive — this property represents the moment when the browser starts parsing the document. It sets the document.readyState property to interactive and fires a readystatechange event at the document.
  • domContentLoadedEventStart — this property represents the moment right before the browser fires the DOMContentLoaded event at the document. This event fires when the initial HTML document has been completely loaded and parsed. It doesn’t wait until stylesheets, images, or asynchronous scripts (with async attributes) are loaded. However, it does wait until all other scripts are loaded and executed, without taking their order into account. For instance, if the defer attribute is applied to the script, it won’t block the browser from parsing the page but the DOMContentLoaded event will be fired only after this script is fully loaded and executed.
  • domContentLoadedEventEnd — this property represents the moment right after the browser fires the DOMContentLoaded event at the document. This event fires when the initial HTML document has been completely loaded and parsed. It doesn’t wait until stylesheets, images, or asynchronous scripts (with async attributes) are loaded.
  • domComplete — this property represents the moment when the browser finishes parsing the document. This event fires when all parts of the HTML document have been completely loaded and parsed, which includes stylesheets, images, and asynchronous scripts (with async attribute). It sets the document.readyState property to complete and fires a readystatechange event at the document.
  • loadEventStart — this property represents the moment right before the load event is fired at the document. The whole page is loaded at this time, including stylesheets, images, and asynchronous scripts (with async attributes).
  • loadEventEnd — this property represents the moment right after the load event is fired at the document.

In addition to these events, we need to know how well our app performs when our code starts running. Since our app is providing playback content it would be nice to define the moment when the app requests the first frame of playback.

  • first-frame-requested — represents the moment when the app is ready to play the content and it will start immediately after the response with the first frame will be processed in the browser.

We can also measure when the first frame responded but in such a case, this event will capture the latency of the round trip to the server in addition to the time spent waiting for the response to be delivered. Thus the measurement from this event will include the network connection and the server job. Since we are interested in how well the client performs, this event won’t bring us too much value at this stage.

Apart from the player on the main screen, where we measure how fast the app requests the first frame, there are other pages that a user can switch between. The next important metric for us is how much time passes when navigating from one page to another. Since the app is a single page application, we exclude the round trip to the server from the measurement:

  • page-open — represents the duration between the moment when a user started the navigation to a new page and when the new page is open.

The pages contain UI elements that are organized in rows, columns, etc. Navigation between them might require animation to give a user visual feedback while the focus context is changing, or while the next UI components are being preloaded to be in place when a user continues navigating down. All these actions require resources and the longer the context shift lasts, the worse user experience the app provides. In the example below we can measure how long it takes to move the focus on vertical scrolling (row-to-row) and on horizontal scrolling (tile-to-tile):

  • content-selected — represents the duration between the moment when a user starts navigation from the currently selected context and when the animation is completed and the new context is selected.

We defined the custom metrics we would like to measure and the basic ones that are applicable for any web application. Let’s see how we can retrieve them.

How to measure

The easiest way to get the performance metrics is using Performance API which provides access to different interfaces like:

  • PerformanceNavigationTiming — provides methods and properties to store and retrieve metrics regarding the browser’s document navigation events. This is a new technology which is not supported in all browsers, so using this interface should have a fallback in case it’s not supported. It can be accessed via window.performance.getEntriesByType(‘navigation’) which returns an array with the one PerformanceEntry item describing all navigation events that occur at the page’s loading.
  • PerformanceTiming — is a legacy interface kept for backward compatibility and contains properties that offer performance timing information for various events that occur during current page loading and use. The PerformanceTiming object describing the navigation events is retrieved from the window.performance.timing property. The major difference from PerformanceNavigationTiming is the timestamp format of each value of the navigation events. PerformanceTiming formats the values in Unix time, while PerformanceNavigationTiming returns a DOMHighResTimeStamp.
  • PerformanceResourceTiming — interface enables retrieval and analysis of detailed network timing data regarding the loading of an application’s resources.
  • PerformancePaintTiming — grants timing information about “paint” (also known as “render”) operations during web page construction.

Knowing all these interfaces, all we need to do is to get the entries that represent each interface. To get as many performance metrics as we can we have to check which interface is available in the browser regarding navigation events. If PerformanceNavigationTiming is available, it is preferable to use it over PerformanceTiming. PerformanceTiming is deprecated and should no longer be used if the modern one is present. In older browsers PerformanceTiming still can give access to navigation events:

However, in the example above, we are getting the values in different formats, each with its own definition of how to interpret the time values. To solve this we need to define a reference point that would be the same for both interfaces.

PerformanceNavigationTiming provides values which represent the time passed after the time origin. The time origin is a moment in time that is considered as the start of the document’s lifetime.

PerformanceTiming stores measurements in UNIX Epoch timestamp. Here is the comparison of the same metrics received from two sources:

The example above doesn’t show all available properties but it’s enough to show the difference in some fields. PerformanceNavigationTiming has the startTime field that represents the time origin. Literally, it’s when the navigation starts which is the moment when the browser context was created, if the current Document is the first in the Window. PerformanceTiming has the navigationStart field, which represents the same point in the timeline but in UNIX Epoch time format. Knowing this we can format all values we got from the PerformanceTiming interface so that they will be using the same reference point as the numbers from the PerformanceNavigationTiming interface. The navigationStart property is considered as 0 and all the others will be normalized by subtracting navigationStart time from each value like follows:

After this conversion is applied to all fields, the data collected on a device not supporting the modern interface will also send data to the backend using the time origin as a reference point. As another option this conversion can be done later on the backend, to avoid an extra calculation on the client-side. In this case, before sending the values in UNIX Epoch format, we need to mark the payload as having old format data.

In order to measure custom events, we can use the PerformanceEntry object to create custom entries by using the Performance API’s mark and measure. This interface is a part of Performance Timeline Level 2 so for the browsers that don’t support it, it would require providing a polyfill to use performance.mark() and performance.measure() directly.

The performance.mark() function creates an object with an entry type of mark with a given name, which contains the property startTime — a timestamp in the browser’s performance entry buffer, that is passed after the time origin. The method requires one argument — the name of the entry. Here is an example:

The performance.measure() function creates an object with an entry type of measure. It has one required argument: name, and two optional properties: startMark and endMark. It returns a PerformanceMeasure entry. When only name is provided the output will be as follows:

The performance.mark() function returns a similar value except for the contents of the startTime and duration properties. Here the duration is the time passed after the time origin, while startTime is 0. Why is it like that? The performance.measure() method typically measures the time between two marks. When the two optional properties, startMark and endMark, are not provided the measurement is made against the time origin. Let’s look at another example:

We passed the next startMark argument which is considered as a starting point of the measurement. As you can see startTime has the value from the first example. The duration property represents the time passed between the start time of the PERFORMANCE_START mark and the current time (the moment when the performance.measure() was called).

If we provide the third argument, endMark, we can measure the time between the marks we previously placed on the “performance timeline”. Both startMark and endMark can be the name of a PerformanceTiming property. In the example below, we add the STREAM_START mark and later measure it against domComplete from PerformanceTiming:

The startTime represents the domComplete property (when the browser finished parsing the document) and the duration is the time passed from that moment till we marked the STREAM_START event.

If the browser is really old but performance still needs to be measured and PerformanceTiming is not available, Date.now() can be used instead. Since Date.now() returns the number of milliseconds elapsed since January 1, 1970 00:00:00 UTC, it can’t provide the same (as accurate as possible) timestamp as performance.now(). The timestamp created by Date.now() represents an accurate moment in time, which can be considered as time origin for measurements against the time elapsed since UNIX Epoch start time. However, this value can represent only domLoading or domInteractive events, which are the earliest events that can be determined. All following custom metrics will consider this value as the time origin and will be measured against it. Unfortunately, these calculations can’t be the same precision as when navigationStart or startTime events are determined (which event is used, depends on the interface, as was described above). To capture these events the script in the example below must be added to the app’s main HTML file:

It stores several timestamps for the events which describe the document navigation into a global variable. To make it particular it requires a unique identifier that would guarantee that this variable won’t be overwritten. Additionally, to access this in the code, that same unique identifier should be injected into the code at the build stage.

In order to perform the measurement between two marks when modern interfaces are not available, the simple calculation of two timestamps from Date.now() can be applied:

Conclusion

Performance monitoring is a good guard to repeatedly prove that the app works in a sustainable way on each new release. The following key points sum up what is important to remember while working on this feature:

  • Well-defined metrics of what should be measured. They should be consistent over time to provide the same outcome for different versions of the app.
  • Combined usage of different interfaces provided in browsers. The data from these sources must be aligned in accordance with a defined reference point.
  • All these analytics calculations should not cause further performance degradation which would lead to a poor user experience. It might be better to send raw collected metrics to the backend and perform the calculations later when it comes to data analysis. WebWorker can also be an option to reduce the execution load in the main thread and let it run without being blocked. Which is key for an application where UI plays a crucial role.

Thanks for reading this article! Sign up for the Zattoo Tech Blog, to get the latest updates about developing, engineering and designing the future of television.

Thanks to Tom Bridgwater for his thorough review.

--

--