Understanding the different Google Core Web Vital Sources 🤔

WebVitalize.io
6 min readNov 22, 2021

--

Imagine this: Your team has a task of “Getting to Green” for Google’s Core Web Vitals. The SEO/Digital Marketing team have pegged some significant goals-against Google’s latest hyped-up metrics, so you dive in and get started with trying to find a baseline score.

When faced with this situation it is often useful to ask: How bad is the current predicament, and how much work would it be to get to green?

There seems to be no simple answer to this question.

Understanding the various Sources

There are quite a few websites/tools that can be used to get your Core Web Vitals score, unfortunately, their results seem to vary. Understanding some of the nuances around how they are reporting your score, will help you understand why the results often seem to vary. There are a few areas on how they might differ in reporting, let's see what they are;

Sample size:
Most real time solutions (excl. RUM) will have a very small sample size. Lighthouse-powered tests, or using a browser plugin to report scores could have a sample size as low as 1. Other sources like Google’s CruX or correctly configured RUM tracking could include up to 100% of your Website traffic in their data set. Obviously, a larger data set is preferable to a smaller data set.

Synthetic (Lab) vs Real Users (Field):
Do real human interactions drive the data set, or is it simulated by some type of digital/synthetic user? Automated testing frameworks, including Lighthouse, are considered “Lab” data, meaning they’re synthetic. Real User-driven data, from your actual users (not just your dev team), would be considered “Field” data.

Field data can be relied on as the final authority for what your users are experiencing. Lab data generally has limited accuracy and usefulness.

Field data is valuable because it takes into account differences such as screen size, network speed, device type, etc.

Feedback Loop & Granularity:
It is quite important to consider what time range you are reporting on. It’s useful to report on the scores as they are right now in production. But it’s equally important to be able to see the granularity/intervals and trends over that feedback period.

For example, Google’s CrUX data set reports over 28 days, which sounds super useful. But it does not show you each day’s score. Rather the calculation is for a rolling 28 day period.

This means you cannot isolate yesterday’s score (after, for example making a big change) away from the previous 27 days’ scores, and it could take up to 28 days for the full impact of a change to reflect. At this point, the next 27 days are also rolled into the same score.

Tracking trends over time is invaluable. Trends give context and can be used to determine whether your score is improving or degrading.

Trends can also be used to ask certain important questions such as:

  • When did your scores change?
  • How far are we from our historical baseline?

How are the Core Web Vital Scores calculated?
Google uses a 75th percentile calculation, and for good reason. (Curious to find out more about how this works then please read our deep dive into percentiles — coming soon).

Other DIY reporting tools, including Google Analytics, often report on Averages rather than Percentiles.

Origin or Page Level Reporting?
Origin Scores are reported for all pages in a single domain/origin. Page scores are for a single page on an origin. To understand origin score vs page-level scores better, let’s use an example scenario:

Imagine a site with 3 pages, a home page, page-1, and page-2.

www.example.com : Will have its own page level score, even though it’s a home page.
www.example.com/page-1 : Will have its own page level score for the page-1 route
www.example.com/page-2 : Will have its own page level score for the page-2 route

But the three scores together will combine into the data set used for the origin score of the entire domain — example.com.

In addition, more popular pages will count more towards your origin score than less popular pages. So if page-2 is 100x more popular than page-1, it might have 100x more data points in the data set than page-1. For this reason, your most popular pages can help (or hurt) you more than your least popular pages.

Device Types:
Core Web Vital scores often differ significantly depending on the device type in your data set. Some tools allow for the segmentation of Desktop and Mobile devices, others may include Tablet as a specific device type. Be sure to know which device types you’re reporting on and optimizing your site for.

The Holy Grail:
The best-case scenario is to have:
- the largest possible sample size,
- driven by real users on your site,
- with control over the granularity and intervals,
- displayed as a trend rather than a single number,
- calculated with a 75th Percentile calculation,
- with additional filters for devices, pages, and location.

Now that we understand some of the differences, let’s meet the most common data sources.

LIGHTHOUSE: Every development team has run a Lighthouse test. This runs on a synthetic device against a current production/development endpoint. Every time the test is run with specific parameters, it should be somewhat consistent. Lighthouse allows for control over running in mobile/desktop mode, and with specific network speeds.
The output of a Lighthouse test might also include snippets of the CrUX data set, so be careful not to treat all data on a lighthouse test as if it is real-time Lighthouse test data. Some metrics might be realtime from the current test run, while others might be a 28-day score pulled from CrUX.

CrUX: Google uses Chrome browser to report back on the browsing experience from real users. If you inspect your network tab while you’re browsing, you might see these performance scores being sent back to Google. These data points are stored in a data set referred to as the CrUX data set (Chrome User Experience Report), and form the basis of all of Google Core Web Vitals scoring tools (including, but not limited to the CruX big query data set, Search Console, Page Speed Insights and they’re even included in Lighthouse reports).

The Origin CruX data score is the best source of how Google currently rates your site, but it is not very useful for development because of the slow feedback loop (a rolling 28 day period with a p75 calculation). This feedback loop is not sustainable for modern development workflows, so while it can be used as a target to reach, it can’t really be used to quickly test the impact on changes or improvements.

BROWSER plugins: There are dozens of these around. Generally, they can give quick feedback to a user on the sites as they browse them. They’re fun and useful, but very limited in the accuracy of the data due to such a small sample size of users.

RUM: Real User Monitoring/Real User Metrics can be used as a catchall for many different approaches. What they should all have in common though is that they monitor Real Users, making them a good source of Field Data over a larger sample size. The tooling, calculations, and sample size are going to differ based on the implementation. Basically, not all RUM solutions are created equal.

INTRODUCING Webvitalize.io

At WebVitalize, we chose to maximize the visibility of all useful Data sources. We collect CrUX origin, CrUX Page-level, and Lighthouse scores for multiple page types on your domain frequently (every hour).

We show all of these trends over time, with different time ranges for you to select from. In addition, we make it extremely easy for you to send up to 100% of your RUM traffic through, an affordable Enterprise level solution and apply the correct 75th Percentile Calculation to all relevant data sets.

👋🏽 bye for now,
Author: Dylan Harbour | Linkedin | Head of Engineering

Want to find out more?
Contact us:
https://www.webvitalize.io/ | team@webvitalize.io
https://twitter.com/WebVitalizeTeam

--

--

WebVitalize.io

Monitor & Predict, your Core Web Vitals in Real-Time ⚡️ See the real-world impact of changes within minutes of deployment to your Web Vital scores 🚀