At Airbnb, we created the Page Performance Score to provide our engineers and data scientists a multitude of user-centric performance metrics to better understand and improve our products. In this post, we will dive deeper into how we define these metrics and instrument them on iOS.
The entire customer journey on Airbnb is divided into different pages, each of which has its own measured Page Performance Score (PPS). In order to support this page-based performance tracking system, we built a standardized infrastructure that enables engineers to configure pages representing their features.
On iOS, a page is associated with a UIViewController. We collect performance data throughout a UIViewController’s lifecycle and only emit the logging event on viewDidDisappear. This logging event cannot be created or sent without a PageName, a universal page identifier.
Due to the many edge cases and complexities involved in instrumenting these metrics, we created a Page Performance Score state machine class, called PPSStateMachine. This class encapsulates all the logic to track and compute the performance metrics and generate logging events. Any engineer who wants to log a PPS event can do so by obtaining the PPSStateMachine associated with their UIViewController and calling the relevant methods during the UIViewController’s lifecycle events. To make things even simpler, we’ve built additional tooling and infrastructure so engineers only need to provide a name for their page and the state of the content — e.g., loading, loaded, or error.
When measuring performance, all time is measured in nanoseconds and then converted into milliseconds. By creating a typealias for the concept of nanoseconds (UInt64) and milliseconds (Float64) to more specific types, we force developers to think about the scale when converting to more commonly used types (e.g., Int, Float).
When taking the current time, we use a monotonic clock, a clock whose value increments monotonically and will continue to increment while the system is asleep. The value is of type 64-bit nanoseconds.
When marking the start and end time of a duration, we have a computed variable that returns the current time in milliseconds. This allows us to avoid most accuracy and precision errors due to casting.
Every UIViewController has an associated PPSStateMachine. This PPSStateMachine can be overridden in the event the developer wants to measure a series of pages under one name. Associating with a UIViewController allows the PPSStateMachine to be found on a UIView by crawling the view responder chain.
Declaring lifecycle and semantic methods in the PPS protocol allows us to abstract away how the score is being calculated. Most updates to the PPS formula — with the exception of entirely new metrics such as video performance — do not result in developers needing to update their respective features. Behind the scenes, any major change to the formula is first tested by placing the potential value into the logged event’s metadata. Once the potential value is validated, it can be upgraded to an official value that affects the page’s performance score.
Time to First Layout (TTFL)
TTFL starts during the UIViewController’s viewDidLoad and ends after the UIViewController’s first viewDidLayoutSubviews.
Time to Initial Load (TTIL)
TTIL starts during the UIViewController’s viewDidLoad and ends one render cycle after loaded content has been set.
Scroll Thread Hangs (STH)
STHs are reported as the difference between the duration of the hitch, filtering on a minimum threshold of twice the refresh rate, and the maximum frame duration.
CADisplayLink accurately observes most STHs. The RunLoop.Mode is RunLoop.Mode.Tracking.
Every time the display link is fired, we make a calculation based on the old frame and the current frame.
Main Thread Hangs (MTH) tracking could exist on iOS, however, accurately tracking MTH incurs a small but consistent drag on performance. In our tests of MTH tracking, the CPU was not able to sleep, battery was drained, and the metric wasn’t giving us significantly more information regarding visually-perceived performance than STH. As a result, we decided not to measure MTH on iOS.
Additional Load Time (ALT)
ALT starts when a loader is shown and ends one render cycle after the loader is gone and content is set.
To illustrate this metric, let’s take a look at infinite scroll. If the bottom is reached before the next page has been loaded then the ALT recorded is the time that the loader (or bottom) is visible until the next page has loaded. If the bottom is never reached, for instance, due to prefetching, then an ALT of zero is logged. In order to accurately log, we need to know the scroll percentage, whether the bottom loader is visible, and a state machine to track the old state.
Rich Content Load Time (RCLT)
RCLT is entirely hidden from engineers with our view abstraction, URLImageView, which is capable of showing an image from a URL.
RCLT only tracks the time that a loader or placeholder is visible. If a loading image is hidden then the act of hiding marks the end of the RCLT.
On every URLImageView state change the corresponding PPSStateMachine is found by crawling the view’s responder chain and updating the state machine with whether the image is loaded or not. The PPSStateMachine will calculate the duration and remove the URL portion, only saving the duration, if the duration is under a specified threshold so that logs are not too large.
Our current implementation of PPS on iOS has allowed engineers to quickly implement and receive real performance data. We are continually evolving and expanding our tooling and infrastructure. We hope that you can apply and advance our learnings in your company.
Thank you to everyone who has helped build PPS on Native: Luping Lin, Antonio Niñirola, Bryan Keller, Noah Martin, Andrew Scheuermann, Josh Nelson, Josh Polsky, Jean-Nicolas Vollmer, Wensheng Mao and everyone else who helped along the way.