Pagespeed Framework: How We Measure Page Speed

Abraham Mathew Muthalaly
8 min readNov 1, 2022

--

You may have seen my recent LinkedIn post on “What is Pagespeed?”. (Link at the end of this article). I promised I would share details on the framework we’ve built and the reasons for doing so.

Like all journeys it started with an idea, which quickly became our vision: To ensure all Emirates Group customers, internal or external, get best-in-class Pagespeed

We knew we can’t realise this vision if only the performance engineers were responsible for performance. (One performance engineer for a team of ~100). It had to be a collaborative effort, and every member in the team, whether a developer, a quality engineer, a performance engineer or an SRE, has to be empowered to deliver Pagespeed improvements to our customers.

We therefore came up with a mission:

  • Continuous Measurement and Improvement: Provide all teams with the ability to continually measure pagespeed where they need it
  • Autonomy: Ensure teams are empowered to deliver Pagespeed improvements themselves
  • Self-Sufficient: To give teams the best tools for the job

We used the principles of design thinking to frame what we needed to meet our vision and our mission. We kept it human-centered, and had numerous sessions with our customers to understand their frustrations and pain points. Through these sessions, we were able to clearly see what the gaps were, what these meant to our stakeholders, and therefore what we needed to build.

A board depicting where we are and where we want to be
Figure 1: Canvas depicting where we were and where we need to be

We knew that all teams are concerned about Pagespeed and everyone wanted to ensure their products provided best in class performance, but there was no consistency in the practices adopted. Some of the key findings from the discussions were:

  • Tests were infrequent, and were mostly ad hoc manual tests
  • As most tests were manual, there were no performance metrics trends available, making it difficult to determine improvements or degradations
  • No ability to determine the performance impact of a particular release or change
  • Chrome User Experience Data was not captured
  • While the existing tool had interesting features, enhancements were near impossible
  • Limited dashboards to compare and analyse performance
  • Test execution time significantly increased in proportion to the number of tests in the pipeline

Therefore, at a minimum, any solution that we came up with, needed to have the following features:

  • Measure performance of our pages over time
  • Measure and trend Core Web Vitals
  • Identify performance regressions and the ability to identify the cause
  • Single source of truth for Pagespeed
  • Near real-time feedback on performance
  • Data driven approach to performance

Tool Selection

Once we had a clear idea of what our users needed in order to address their challenges, we spent some time investigating what tools were available, and which among those would meet our needs.

Key requirements

  • A “cloud-ready” solution which could be deployed into our existing cloud accounts
  • The ability to use ephemeral instances and on-demand pricing
  • The ability to test both internal/intranet (on-premises/behind firewall) and external (public facing) sites
  • To be able to integrate into our CI/CD pipelines to meet our objective of being able to shift-left (to give teams fast feedback on what they build)
  • Low Total Cost of Ownership (TCO), given that we have a lot of teams who will be using this solution
  • The ability to run tests using real browsers — both for simple pages, and for more complex scripted journeys

What did we find?

There are numerous tools available in the market, both open source and commercial, including SaaS solutions. But we knew that we needed a solution that will help us to both shift-left (run in lower environments) and shift-right (run in production). We quickly realised that SaaS tools would not be feasible, as the overall management would be cumbersome, and requires very high effort both in set up and maintenance. Being a large organisation, our security environment is tight, and it’s just not possible to expose all our non-production environments to the outside world.

We performed an analysis of the likely number of tests which our teams would run, and then produced a costing spreadsheet. We were able to compare the TCO for commercial (including pay per test) solutions versus open source solutions. This helped us to confirm that the tool we selected would provide value for money, and that costs would not escalate as we on-boarded more teams and implemented data persistence for long-term trending.

What tool did we select?

We found that no single tool, out of the box, met all our needs. We therefore decided on a combination of an open-source tool (Sitespeed.io), and building our own framework on top of that. By adopting this approach, we could customise the features based on what the users really wanted.

Sitespeed.io is an open source solution, and already had a host of features that we needed. It provides the ability to persist data to a time series database, and comes with some rich customisable dashboards. The tool creates some nice html reports, and gives a number of recommendations on potential improvements.

What we’ve built

We decided to build our framework in such a way that we could leverage our existing AWS accounts, which have access to both our internal/intranet and external sites. Using cloud meant we could easily spin up multiple Sitespeed worker instances to run tests in parallel, and give our users the answers they need, rapidly.

We planned to build our framework in such a way that our users could express their tests in code, with configuration specifying where to run the test, which devices to use and under what network conditions. This way, we could keep all the underlying tool configuration away from them.

Sitespeed.io is (almost) infinitely configurable. We love this, but our users would find it quite overwhelming, so instead, we give them a much more simplified set of configuration options, and we do the heavy lifting in the backend.

The most important aspect of our framework is in ensuring our tests are done under lab-based conditions. This means we have very specific viewports and pixel densities, as well as packet level throttling to simulate mobile and Wi-Fi networks. In order to deliver consistency against all of our tests we’ve simplified the parameters the teams can use. See below the choices our teams can have to run their tests.

The different network and device parameters we use for our tests
Figure 2: Parameters for test execution

In addition to the synthetic tests, we also pull data from Google’s Chrome User Experience so that we get metrics of our customers’ experience out in the field. All of this data goes into a time-series database (Graphite) for long term trending and we use Grafana to visualise this. We also have a RUM tool, but that’s for another day!

Framework Architecture

The diagram and table below give an overview of the main components of our Pagespeed framework, and the purpose they serve.

The conceptual architecture of our pagespeed framework
Figure 3: Conceptual Architecture
The list of componenets of the pagespeed framework and what pupose they serve
Table 1: Components of the Pagespeed framework

Our Approach

We’ll spew out another article on how we use our framework day to day, but at a high level we have given teams the ability to run tests shifted left, which means setting up tests in their CI/CD pipelines. Why is this important? Well, it gives them fast feedback within their development sprint. If you’re a developer working on a feature, you’ll know as soon as you’ve built your code whether it’s impacted performance or breached an NFR. This therefore means a poor performing page will never reach one of our customers.

Teams can also shift right, which is to run test in production to capture key Pagespeed metrics. Running tests in production lets us see what our customers see, albeit in a controlled lab environment, and will drive the performance improvements based on the vast data that the tests produce. This is particularly important when we have business colleagues creating new tags or adding new content which doesn’t call for a technical deployment.

Teams can also set up and define budgets so that they’re focusing on all areas of performance. This is a serious level of maturity and will really allow teams to be best in class for Pagespeed. More on performance budgets in an upcoming article!

The Outcome

With so much data and the power of visualisation (Grafana), we can visualise our metrics in many ways. We can create leaderboards which show how we rank against our competitors; our business colleagues love this. We can also rank teams against each other and create a bit of healthy competition amongst product owners!

A leaderboard showing the performance of various pages
Figure 4: Leader Board

I mentioned before that we capture data from the Chrome User Experience Report. We can use that to trend our actual Core Web Vitals over time. As Core Web Vitals now impact search rankings, this data is super useful to our marketing and SEO teams.

A Dashboard for Core Web Vitals
Figure 5: Core Web Vitals Dashboard

If we want to do RCA for issues, we can build a much more complex dashboard where we can correlate specific page metrics. This is where our performance engineers spend most of their time. Teams are also able to annotate graphs, and thereby mark significant events like releases and upgrades. This also helps to correlate performance regressions or improvements with the event.

A detailed dashboard with all the performance metrics we collect for each test execution
Figure 6: Detailed page statistics dashboard

As we start to run more and more tests, we need real-time feedback for performance regressions. With the power of Grafana we can use alerts to notify us if a metric for a page has deviated against its moving average, or if it has breached an absolute threshold.

An image depicting how we alert based on moving averages
Figure 7: Alerts based on Moving Average

As we trend over time, we can see the impact of our changes. See if you can spot where we made a fix to improve our Largest Contentful Paint (it is not that tough 😄).

An image depicting the performance improvments we have made over a certain period
Figure 8: Continuous performance improvement

In Summary

This really is just the beginning. While building this framework was so much fun, the beauty is in the vast amount of data we capture and analyze. This is the foundation for implementing a performance driven culture, while also ensuring that teams are autonomous.

#CodeEmirates #CodeBetter #Pagespeed #PerformanceEngineering

Authors: Shahil Bhimji, Barry Perez, Abraham Mathew Muthalaly

LinkedIn Post on Pagespeed: What is Pagespeed?

For more information on Sitespeed, check Sitespeed.io

We’re also always on the lookout for great talent. If performance engineering is your passion, check the Emirates Group careers site for current vacancies.

--

--