Firefox data platform & tools update, Q3 2017
As the data platform & tools team we provide core tools for using data to other teams. This spans Firefox Telemetry, data storage and analysis to some central data viewing tools. To make new developments more visible we publish a quarterly update on our work.
In the last quarter we continued focusing on decreasing data latency, supporting analytics and experimentation workflows, improving stability and building Mission Control.
Let’s go faster
To enable faster decision making, we worked on improving latency for important use-cases.
Most notable is that the main pings now arrive much faster, which power most of our main dashboards and analysis. The new rule of thumb is 2 days until 95% of the main ping data is available, from activity in the browser to being available for analysis.
In Firefox Telemetry we can now record new probes from add-ons without having to ride the trains, which greatly reduces shipping times for instrumentation. This is available first with events in 56 and scalars in 58.
The new first-shutdown ping helps us to better understand users that churn after the first session, by sending the first sessions data of a user immediately on Firefox shutdown.
This year saw a lot of cross-team work on enabling experimentation workflows in Firefox. A focus was on enabling various SHIELD studies.
Here the experiments viewer saw a lot of improvements, which provides a front-end view for inspecting how various metrics perform in an experiment.
An experiments dataset is now available in Redash and Spark, which includes data for SHIELD opt-in experiments and is based on the main_summary dataset.
The experiment_aggregates dataset now includes metadata about the experiment, and its reliability and speed have improved significantly.
Tools for exploring data
Our data tools make it easier to access and query the data we have. Here our Redash installation at sql.telemetry.mozilla.org saw many improvements including:
- Query revision control and reversion.
- Better security and usability for templated queries.
- Schema browser and autocomplete usability and performance improvements.
- Better support for Athena data sources.
On the Firefox side, about:telemetry got a major redesign, which makes it more easy to navigate, added a global search and aligns it with the photon design.
Powering data analysis
To make analysis more effective, two new datasets were added:
- clients_daily, which summarizes main ping data into one row per client and day.
- heavy_users (docs), which has a similar format, but contains only clients that match our definition of “heavy users”.
For analysis jobs run through ATMO, the reliability was greatly improved, which resulted in a big decrease of job failures.
Also, support was added for using of different EMR versions with different ATMO installations, allowing us to test changes to our EMR configuration much more thoroughly prior to deployment.
What is up next?
Some of the things that we will work on in the next months include:
- Firefox 56 saw data preference changes in the UI, we will follow up to align some Telemetry behavior.
- Databricks is being actively evaluated, with the goal of improving analysis productivity and reliability.
- Further usability improvements to current Experiments Viewer, and significant work done on a ground-up rewrite.
- Providing a dataset for “one day retention” analysis.
- A generic HTTP endpoint, moz_ingest, will be available to accept non-telemetry data. Data can be posted in any format but if it is JSON it can automatically tie into our schema validation capabilities.
- Collaboration with the Activity Stream team on bringing our event pipelines together.
- Activity stream is cross-checking & augmenting their experiment pipeline’s results with the experiments_summary dataset.
Get in touch
Please reach out to us with any questions or concerns.
- You can find us on IRC in #telemetry and #datapipeline.
- We are available on slack in #fx-metrics.
- The main mailing list for data topics is fx-data-dev.
- Bugs can be filed in one of these components.
You can also find us on Twitter as @MozTelemetry.