Firefox data platform & tools update, Q3 2017

Mission Control showing content crashes per 1k usage hours.

As the data platform & tools team we provide core tools for using data to other teams. This spans Firefox Telemetry, data storage and analysis to some central data viewing tools. To make new developments more visible we publish a quarterly update on our work.

In the last quarter we continued focusing on decreasing data latency, supporting analytics and experimentation workflows, improving stability and building Mission Control.

Let’s go faster

To enable faster decision making, we worked on improving latency for important use-cases.

Most notable is that the main pings now arrive much faster, which power most of our main dashboards and analysis. The new rule of thumb is 2 days until 95% of the main ping data is available, from activity in the browser to being available for analysis.

In Firefox Telemetry we can now record new probes from add-ons without having to ride the trains, which greatly reduces shipping times for instrumentation. This is available first with events in 56 and scalars in 58.

The new update ping provides a lower-latency signal for when updates are staged and successfully applied. It is queryable through the telemetry_update_parquet dataset.

Similarly, the new-profile ping is a signal for new profiles and installations, which is now queryable through the telemetry_new_profile_parquet dataset.

The new first-shutdown ping helps us to better understand users that churn after the first session, by sending the first sessions data of a user immediately on Firefox shutdown.

Enabling experimentation

This year saw a lot of cross-team work on enabling experimentation workflows in Firefox. A focus was on enabling various SHIELD studies.

Here the experiments viewer saw a lot of improvements, which provides a front-end view for inspecting how various metrics perform in an experiment.

An experiments dataset is now available in Redash and Spark, which includes data for SHIELD opt-in experiments and is based on the main_summary dataset.

The experiment_aggregates dataset now includes metadata about the experiment, and its reliability and speed have improved significantly.

Other use-cases can build on the ping data from most experiments using experiment annotations, which is available within 15 minutes in the telemetry-cohort data source.

Tools for exploring data

Our data tools make it easier to access and query the data we have. Here our Redash installation at saw many improvements including:

  • Query revision control and reversion.
  • Better security and usability for templated queries.
  • Schema browser and autocomplete usability and performance improvements.
  • Better support for Athena data sources.

Mission Control is a new tool, which makes key measures of a Firefox release, like crash counts, available with low latency. An early version of it is now available here.

On the Firefox side, about:telemetry got a major redesign, which makes it more easy to navigate, added a global search and aligns it with the photon design.

Powering data analysis

To make analysis more effective, two new datasets were added:

  • clients_daily, which summarizes main ping data into one row per client and day.
  • heavy_users (docs), which has a similar format, but contains only clients that match our definition of “heavy users”.

For analysis jobs run through ATMO, the reliability was greatly improved, which resulted in a big decrease of job failures.

Also, support was added for using of different EMR versions with different ATMO installations, allowing us to test changes to our EMR configuration much more thoroughly prior to deployment.

What is up next?

Some of the things that we will work on in the next months include:

  • Firefox 56 saw data preference changes in the UI, we will follow up to align some Telemetry behavior.
  • Databricks is being actively evaluated, with the goal of improving analysis productivity and reliability.
  • Further usability improvements to current Experiments Viewer, and significant work done on a ground-up rewrite.
  • Providing a dataset for “one day retention” analysis.
  • A generic HTTP endpoint, moz_ingest, will be available to accept non-telemetry data. Data can be posted in any format but if it is JSON it can automatically tie into our schema validation capabilities.
  • Collaboration with the Activity Stream team on bringing our event pipelines together.
  • Activity stream is cross-checking & augmenting their experiment pipeline’s results with the experiments_summary dataset.

Get in touch

Please reach out to us with any questions or concerns.

  • You can find us on IRC in #telemetry and #datapipeline.
  • We are available on slack in #fx-metrics.
  • The main mailing list for data topics is fx-data-dev.
  • Bugs can be filed in one of these components.

You can also find us on Twitter as @MozTelemetry.