Creating Airbnb’s Page Performance Score

Andrew Scheuermann
Nov 18 · 6 min read

Learn how Airbnb built the Page Performance Score, a 0–100 score that measures multiple performance metrics from real users on any platform.

Andrew Scheuermann

Two men playing guitar, one main playing an oboe.

Performance is important at Airbnb and part of our Commitment to Craft. A fast experience is good for business and critical to our mission to “create a world where anyone can belong anywhere”.

Before we can create a fast experience we need to agree on what “fast” measures. Web, iOS, and Android each have different platform-specific performance metrics. For product engineers it can be challenging to understand which of these metrics to prioritize, and for management it’s difficult to compare platforms and keep progress reports succinct.

We’ve developed a new performance measurement system called the Page Performance Score that allows us to track multiple performance metrics from real customers across different platforms with ease. This post describes that system, and in the following weeks we’ll be publishing deep dives into the specifics for Web, iOS, and Android.

Early Performance Measurement Efforts

When Airbnb first started measuring performance, we used a single metric called “Time To Airbnb Interactive” (TTAI) that measured the time from page start to when content became visible and interactive. This approach had many positive outcomes. We built performance tracking architecture, fixed latency issues, and cultivated a company culture that valued performance.

However, TTAI also had shortcomings. Different platforms had different baselines and goals. Page comparisons were difficult because the “interactive” definition could change between similar pages. In some situations TTAI improved but engagement metrics did not. Most importantly, TTAI was a single metric and a single metric cannot capture the full spectrum of our customers’ performance expectations. Our definition of “fast” was incomplete and limited our overall performance efforts.

A single metric cannot capture the full spectrum of our customers’ performance expectations.

Introducing the Page Performance Score

We needed a nuanced view of performance while maintaining the simplicity of tracking a single number, so we created the Page Performance Score (PPS).

  • : The entire customer journey on Airbnb is divided into different pages.
  • : A page contains multiple performance metrics.
  • : Every day, on each platform, we formulate a given page’s performance data into a 0–100 score.

PPS allows us to combine multiple input metrics into an easily comparable score. PPS is a step-function improvement over our prior single-metric approach.

The Metrics

The metrics that we measure differ by platform, but the general approach of measuring multiple metrics and formulating a 0–100 score is the same. All of the metrics are user-centric and fall into two general categories:

  1. measure the time from “page start” to content visible.
  2. measure page responsiveness after the initial load.
The Airbnb app opens, shows a loader, then the final meaningful page content.
The Airbnb app opens, shows a loader, then the final meaningful page content.
The Airbnb homepage displays the loader and then meaningful content.

Initial Load Metrics

(Web) and (Native) measure the time from “page start” until the first piece of content is visible, which is commonly a loader.

(Web) and (Native) measure the time from “page start” until the meaningful content is displayed.

Initial Load Metrics are visualized on the left.

Post Load Metrics

(Web) measures the delay between user interaction and when the browser begins to respond. Delays of 50ms or longer are perceptible to the user.

(Web) and (Native) cause the app to lag during layout, animations, and scrolling.

(Native) measures the average time that additional loaders are displayed within a page, such as during pagination.

(Native) measures the average time for images and videos to load.

(Web) measures layout instability weighted by the size and distance of the element shift.

The Formula

After measuring the metrics we distill that information into a single number using the PPS Formula, which was forked from the Lighthouse Formula. For each metric we identified Good, Moderate, and Poor thresholds based on internal and industry data. We created a scoring curve by assigning the Good range a score above 0.7, the Poor range below 0.5, and the Moderate range in between.

A log normal curve with X values from 0 to 1, and Y values from 0 to 100,000.
A log normal curve with X values from 0 to 1, and Y values from 0 to 100,000.
A 10,000ms metric value would score ~0.9 in this example curve.

Every day we calculate a given page’s metric’s capped average value from millions of real-user page loads. We map that capped average value against the metric’s curve to get a 0–1 score. We combine the metric scores into a composite PPS score by multiplying the metric scores by the metric weights. We chose the weights by examining our performance-focused A/B tests and ensuring that the weights are maximally aligned with Airbnb’s internal engagement metrics.

Web Metric Weights

A percentage stacked bar chart with values TTFCP 35%, TTFMP 15%, FID 30%, TBT 15%, and CLS 5%.
A percentage stacked bar chart with values TTFCP 35%, TTFMP 15%, FID 30%, TBT 15%, and CLS 5%.

Native Metric Weights

A percentage stacked bar chart with values TTFL 10%, TTIL 50%, TH 10%, ALT 15%, and RCLT 15%.
A percentage stacked bar chart with values TTFL 10%, TTIL 50%, TH 10%, ALT 15%, and RCLT 15%.

The resulting PPS formula can be expressed as….

PPS = curve(metric_1) * weight_1 + curve(metric_2) * weight_2 …

For example, on Web….

PPS = curve(TTFCP) * 35% + curve(TTFMP) * 15% + curve(FID) * 30% + curve(TBT) * 15% + curve(CLS) * 5%

PPS Evolutions

Migrating the company from a single metric to PPS was organizationally challenging. We had to train the company to stop viewing performance as a single seconds-based number, which is a paradigm shift that requires cross functional alignment. To help ease the transition we mapped the old TTAI ranges with the new PPS ranges.

A table with the following values: Good Speed equals TTAI less than 3 seconds and also equals PPS greater than 70; Average Speed equals TTAI 3 to 5 seconds and also equals PPS 50 to 70; Slow Speed equals TTAI above 5 seconds and also equals PPS less than 50.
A table with the following values: Good Speed equals TTAI less than 3 seconds and also equals PPS greater than 70; Average Speed equals TTAI 3 to 5 seconds and also equals PPS 50 to 70; Slow Speed equals TTAI above 5 seconds and also equals PPS less than 50.

Once the company understood PPS, improving on it was comparatively easy. We simply add or replace metrics as our understanding of performance improves and the 0–100 score remains constant. PPS was designed to evolve. For example, in 2019 the Chrome team introduced Cumulative Layout Shift, which was a perfect candidate for Web PPS. It was a user-centric metric, had good browser coverage, and could be measured on direct and client-routed page loads. We instrumented the metric, validated the data, and then incorporated it into the next version of PPS. Easy!

Weighted Average Score

In addition to tracking individual pages’ PPS scores we track the entire organization’s overall performance progress by creating a Weighted Average Score (WAS). Consider these example PPS scores and traffic for three common pages:

(73 * 5,000,000 + 84 * 20,000,000 + 75 * 10,000,000) / 35,000,000 = ~80

If these were the only pages at Airbnb our WAS would be ~80. Airbnb has hundreds of pages so a WAS helps us prioritize and proportionally weight the most high-traffic pages.

Conclusion

With PPS our engineers and data scientists now have a multitude of user-centric performance metrics to understand and improve their products. We can clearly compare the performance progress of different pages, different organizations, and even different platforms. PPS allows teams to set simple goals and determine which individual metrics to prioritize. PPS can evolve: metrics can be replaced, weights can change, targets can tighten, and yet the 0–100 score remains constant.

Changing our definition of “fast” has been well worth the effort. The company has evolved from viewing performance as a single metric to a 0–100 score that represents the rich, complex realities of performance. We have leveled up our performance measurement system and hope that you apply these learnings in your organization as well.

Acknowledgments

Thank you to the everyone who has helped build PPS over the years: Aditya Punjani, Alper Kokmen, Antonio Niñirola, Ben Weiher, Charles Xue, Egor Pakhomov, Eli Hart, Elliot Sachs, Gabe Lyons, Guy Rittger, Jean-Nicolas Vollmer, Josh Nelson, Josh Polsky, Luping Lin, Mark Giangreco, Matt Schreiner, Nick Miller, Nick Reynolds, Noah Martin, Xiaokang Xin, and everyone else who helped along the way.

The Airbnb Tech Blog

Creative engineers and data scientists building a world…