Performance Testing Firefox OS

Eli Perelman
Aug 13, 2016 · 10 min read

When we talk about performance for the web, a number of familiar questions may come to mind:

  • Why does this page take so long to load?
  • How can I optimize my JavaScript to be faster?
  • If I make some changes to this code, will that make this app slower?

I’ve been working on making these types of questions easier to answer for Gaia, the UI layer for Firefox OS, a completely web-centric mobile device OS. Writing performant web pages for the desktop has its own idiosyncrasies, and writing native applications using web technologies takes the challenge up an order of magnitude. I want to introduce the challenges I’ve faced in making performance an easier topic to tackle in Firefox OS, as well as document my solutions and expose holes in the web’s APIs that need to be filled.

From now on, I’ll refer to web pages, documents, and the like as applications, and while web “documents” typically don’t need the performance attention I’m going to give here, the same techniques could still apply.

Fixing the lifecycle of applications

How long did the app take to load?

Tough question, as we can’t be sure we are speaking the same language. Based on UX and my own research at Mozilla, I’ve tried at adopt this definition for determining the time it takes an application to load:

The amount of time it takes to load an application is measured from the moment a user initiates a request for the application to the moment the application appears ready to interact with.

On mobile devices, this is generally from the time the user taps on an icon to launch an app, until the app appears visually loaded; it looks like a user can start interacting with the application. Some of this time is delegated to the OS to get the application to launch, which is outside the control of the application in question, but the bulk of the loading time should be within the app.

So window load right?

For Firefox OS, I helped develop a series of conventional moments that are relevant to almost every application for implying its loading lifecycle (also documented on MDN):

navigation loaded

The application designates that its core chrome or navigation interface exists in the DOM and has been marked as ready to be displayed, e.g. when the element is not display: none or other hiding functionality.

navigation interactive

The application designates that the core chrome or navigation interface has its events bound and is ready for user interaction.

visually loaded

The application designates that it is visually loaded, e.g. the “above-the-fold” content exists in the DOM and has been marked as ready to be displayed, again not display: none or other hiding functionality.

content interactive

The application designates that it has bound the events for the minimum set of functionality to allow the user to interact with the “above-the-fold” content made available at visually loaded.

fully loaded

The application has been completely loaded, i.e. any relevant “below-the-fold” content and functionality has been injected into the DOM, been marked visible, and is ready for user interaction. Any required startup background processing is complete and should exist in a stable state barring any further user interaction.

The important moment is visually loaded as this correlates to what the user perceives as something “being ready”. As an added bonus, using this metric pairs nicely with camera-based performance verifications.

Denoting moments

window.performance.mark( string markName )

Specifically during a startup:

performance.mark('navigationLoaded'); performance.mark('navigationInteractive'); performance.mark('visuallyLoaded'); performance.mark('contentInteractive'); performance.mark('fullyLoaded');

You can even use the measure() method to create a measurement between another mark, or even 2 other marks:

// Denote point of interaction performance.mark('tapOnButton'); loadUI();// Capture time from now (sectionLoaded) to tapOnButton performance.measure('sectionLoaded', 'tapOnButton');

And fetching these performance metrics is pretty straightforward with getEntries, getEntriesByName, or getEntriesByType which fetch a collection of the entries. The purpose of this article isn’t to cover the usage of User Timing though, so I’ll move on.

Armed with the moment an application is visually loaded, we know how long it took the application to load because we can just compare it to…oh, wait, no. We don’t know the moment of user intent to launch.

While desktop sites may be able to easily procure the moment at which a request was initiated, doing this on Firefox OS isn’t as simple. In order to launch an application, a user will typically tap an icon on the Homescreen. The Homescreen lives in a process separate from the app being launched, and we can’t communicate performance marks between them.

Solving problems with Raptor

Raptor was built with a few goals in mind:

  1. Performance test Firefox OS without affecting performance. We shouldn’t need polyfills, test code, or hackery to get realistic performance metrics.
  2. Utilize web APIs as much as possible, filling in gaps through other means as necessary.
  3. Be flexible enough to cater to the many different architectural styles of applications.
  4. Be extensible for performance testing scenarios outside the norm.

Problem: Determining moment of user intent to launch

The first break-down in our performance APIs is no way to relate a performance mark in one application with another. Raptor overcomes this rather crudely: log parsing.

Yes, you read that correctly. Every time Gecko receives a performance mark, it logs a message (i.e. to adb logcat) and Raptor streams and parses the log looking for these log markers. A typical log entry looks something like this (we will decipher it later):

I/PerformanceTiming( 6118): Performance Entry: clock.gaiamobile.org|mark|visuallyLoaded|1074.739956|0.000000|1434771805380

The important thing to notice in this log entry is its origin: clock.gaiamobile.org, or the Clock app; here the Clock app created its visually loaded marker. In the case of the Homescreen, we want to create a marker that is intended to be for a different context altogether. This is going to need some additional metadata to go along with the marker, but unfortunately this API does not yet have that ability. In Gaia, we have adopted an @ convention to override the context of a marker, and with it, let’s mark the moment of app launch as determined by when the user taps the icon:

performance.mark(`appLaunch@${appOrigin}`)

Launching the Clock from the Homescreen and dispatching this marker, we get the following log entry:

I/PerformanceTiming( 5582): Performance Entry: verticalhome.gaiamobile.org|mark|appLaunch@clock.gaiamobile.org|80081.169720|0.000000|1434771804212

With Raptor we change the context of the marker if we see this @ convention.

Problem: Incomparable numbers

Looking at the output of Date.now() at any given moment will return the number of milliseconds that have elapsed since January 1st, 1970 and details. Raptor had to make an important trade-off: abandon the precision of high-resolution time for the comparability of the Unix epoch. Looking at the previous log entry, let’s break down what is output. Notice the correlation of certain pieces to their User Timing counterparts:

  • Log level and tag: I/PerformanceTiming
  • Process ID: 5582
  • Base context: verticalhome.gaiamobile.org
  • Entry type: mark, but could be measure
  • Entry name: appLaunch@clock.gaiamobile.org, the @ convention overriding the mark’s context
  • Start time: 80081.169720,
  • Duration: 0.000000, this is a mark, not a measure
  • Epoch: 1434771804212

For every performance mark and measure, Gecko also captures the epoch of the mark, and we can use this to compare times from across processes.

Pros and Cons

  • We trade high-resolution times for millisecond resolution so we may compare numbers across processes.
  • We trade JavaScript APIs for log parsing so we can access data without injecting custom logic into every application which would affect its performance.
  • We currently trade a high-level interaction API, i.e. Marionette, for low-level interactions using Orangutan behind the scenes. While this lets us have transparent events for the platform, it also makes writing rich tests difficult. There are plans to improve this in the future by adding Marionette integration.

Log Parsing?

Raptor Flow

  1. Install the Raptor profile on the device; this configures various settings to assist with performance testing. Note this is a different profile that will reset Gaia, so keep that in mind if you have any particular settings stored. make raptor
  2. Choose a test to run. Currently tests are stored in tests/raptor in the Gaia tree, so some manual discovery is needed. There are plans to improve the command-line API soon.
  3. Run the test. For example, you can performance test the cold launch of the Clock app using the following command, specifying the number of runs to launch it:
APP=clock RUNS=5 node tests/raptor/launch_test

4. Observe the console output. At the end of the test, you will be given a table of test results with some statistics about the performance runs completed. Example:

[Cold Launch: Clock Results] Results for clock.gaiamobile.org Metric Mean Median Min Max StdDev p95 -------------------------------- ------- ------- ------- ------- ------ ------- coldlaunch.navigationLoaded 214.100 212.000 176.000 269.000 19.693 247.000 coldlaunch.navigationInteractive 245.433 242.000 216.000 310.000 19.944 274.000 coldlaunch.visuallyLoaded 798.433 810.500 674.000 967.000 71.869 922.000 coldlaunch.contentInteractive 798.733 810.500 675.000 967.000 71.730 922.000 coldlaunch.fullyLoaded 802.133 813.500 682.000 969.000 72.036 928.000 coldlaunch.rss 10.850 10.800 10.600 11.300 0.180 11.200 coldlaunch.uss 0.000 0.000 0.000 0.000 0.000 n/a coldlaunch.pss 6.190 6.200 5.900 6.400 0.114 6.300

Having access to raw performance data is helpful for getting a quick glance at how long something takes, or to determine if a change you made causes a number to increase, but it’s not very helpful for capturing changes over time. Raptor has two methods for visualizing performance data over time to improve this area.

Official Metrics

Looking at these graphs, you can drill down into specific apps, focus or expand your time query, and do advanced query manipulation to gain insight into performance. Watching trends over time, you can even use it to pick out regressions that snuck into Firefox OS.

Local visualization

Building pretty graphs that display metrics is all well and fine, but finding trends in data or signal within noise can be difficult. Graphs help us understand data and make it accessible for others to easily communicate around the topic, but using them for finding regressions in performance is reactive; we should be proactive about keeping things fast.

Regression hunting on CI

Regression hunting in automation

Recap and next steps

Raptor has been an exciting tool to build, while at the same time helping us drive the web forward in the realm of performance. We plan on using it to keep Firefox OS fast, and being proactive about protecting its performance.


Originally published at eliperelman.com.

Eli Perelman

JavaScript and Node.js Obsessionalist™, ❤️ open source and experimenting with electronic music production. Child of the ’90s web. Skeptic.

Eli Perelman

Written by

Software Engineer at Mozilla

Eli Perelman

JavaScript and Node.js Obsessionalist™, ❤️ open source and experimenting with electronic music production. Child of the ’90s web. Skeptic.