The art of correlation and performance metrics
At Insider Inc., we have been ramping up our focus on synthetic testing as an important way to test web performance. A more performant user experience translates to user satisfaction. Faster news sites have lower bounce rates, more page views, longer session lengths, and higher retention rates.
What is Synthetic Testing?
Synthetic testing can be done with tools such as Web Page Test and Lighthouse, and at Insider we also leverage Rigor and SpeedCurve. Vendors provide remote (often global) infrastructure that visits a website periodically and records the performance data for each run. The measured traffic is not of actual users, it is traffic synthetically generated to collect data on page performance. Synthetic testing provides the full HAR — HTTP archive, detailed information on network activity, and allows one to run expensive operations such as Speed Index.
A challenge when using synthetic testing is convincing our business partners of page choice and frequency. They will often ask how often tests are run, on what pages, and how we know if the tests are valid in relation to the site’s traffic. These are great and important questions. To find our answers we start RUM — real user monitoring.
What is RUM (Real User Monitoring)?
Step One: Gather a set of your most popular pages
The first step is to establish a set of the most popular page types and content. If you are a blog, you probably have a fairly consistent structure of template and content. Our pages are mainly broken into articles, slideshows and video pages. Our most popular browser on desktop is Chrome and our most popular mobile devices are iOS. Using this data, we gather a subset of articles of all types that highlight the majority of our user experience, and setup our synthetic tests accordingly.
What about variance within articles? A lot of our articles vary in length and content: some have widgets, some have many images, some are short form, some are long form. The difference of file size for HTML markup is negligible, however possibly statistically significant, so we gather a subset of articles that represent the longest, average, and shortest in length.
What about variance in widgets and images? The answer here is lazy loading. Typically our articles start with an image so we load this as part of the document’s HTML markup. Everything else, within the post content container, we lazy load. This means the only variation is text length and the first image’s file size, and we control this via file type and compression within our proprietary CMS.
How do we lazy load?
When the page fires the onload event, we scrape the page for anything with a “post load” class. We use a unique id of the element and an object as value for the “loadable.” Each loadable has an id, type, top value, etc. When we get to the scroll top (plus buffer), we load the element.
What about ads? Post loading helps here too but the best solution, in this case, may be to use a large set of metrics, custom ad load timings, Speed Index, and monitor as closely as possible. Working with third party content can be challenging!
Step Two: Evaluate the correlation between synthetic and RUM
The proof lies in the data, so let’s look at some visuals to back up the story. We can do this by comparing our KPI — key performance indicators in RUM to tests ran synthetically. Our top priority metrics are time to first byte (TTFB), start render, custom metrics, and Speed Index.
Let’s take a look at the numbers.
TTFB — Time To First Byte
Backend is tricky, The results could be cached in RUM but not in synthetic (unless you run as such within Web Page Test.) Here we see where synthetic variance begins. Recent work seems to have reduced some variance.
After a month of testing we see some variance but fairly consistent results, meaning our synthetic tests are reflecting real user data. We will continue monitoring and experimenting to validate this correlation.
Step Three: Evaluate the correlation of your top metrics to one another within synthetic
Speed Index is an expensive operation and difficult to test via RUM, but we can compare our top metrics within synthetic. Here I downloaded a large set of data from Rigor into Excel to evaluate.
A chance for variance occurs at multiple states along the timeline but it’s clear that there is solid correlation between TTFB, Start Render Time, and Speed Index. Each metric affects the next and are all meaningful performance indicators for our business.
- Establish a set of the most popular page types and content. Represent your typical pages, browsers, platforms, and network configurations while synthetic testing. Lazy load anything you can below the fold.
- Validate synthetic test data to ensure the tests represent your actual users by comparing synthetic data to RUM data.
- Evaluate the correlation of your top metrics to one another within synthetic. Establish a set of metrics and configurations that closely represent your audience and needs.
I hope this article helps anyone interested in validating synthetic testing, and brings some insight into the relationship of metrics within and between synthetic and RUM processes.