Firefox data platform & tools update, Q1 2017
The data platform and tools teams are working on our core Telemetry system, the data pipeline, providing core datasets and maintaining some central data viewing tools.
To make new work more visible, we provide quarterly updates.
What’s new in the last few months?
On the data collection side, scalars are now supported through the pipeline, so new flag and count histograms are now disallowed on Desktop in favour of boolean and uint scalars.
Event Telemetry is now ready for first adoption. A general events table is available, a sync events table coming up and further uses are being looked at.
For documentation, we re-worked the guide for adding new Telemetry and extended the detailed data collection documentation. The prototype for making probe history more discoverable now has daily updates and supports Nightly too.
For filing or finding bugs, there is now a new Data Platform and Tools product. Note that client-side bugs still go into the separate Toolkit::Telemetry component.
The data pipeline work powers results for re:dash and custom analysis among other things. Notable recent work here includes:
- Providing efficient lookup of client histories using Hbase.
- Experimental support for Zeppelin, a new notebook type that improves Jupyter.
- TMO dashboard is now faster through a dedicated read replica and client-side caching.
- The Dataset API now has a select method to return a subset of fields.
- Providing a framework for testable Python ETL jobs generated from a template.
- Direct-to-parquet is in production, making easier to build datasets from incoming pings.
The data tools work powers tools that make data analysis more accessible across Mozilla. Updates here are:
- For re:dash, the UI improved to make the dashboard list more accessible.
- re:dash query issues were reduced by handling failing queries using exponential back-off.
- There is also a python re:dash client (h/t to emtwo), allowing programmatic generation of queries and dashboards.
- The distribution viewer is now live, making distributions of a set of important Firefox metrics available.
- The analysis service gained features like persistent cluster storage and the ability to extend cluster lifetimes.
What is up next?
For the next few months, interesting projects in the pipeline include:
- Work to decrease data latency, by sending the last ping of a Firefox session immediately. We will also start sending timely pings for new users and updates.
- Rebooting documentation, providing guidance as well as tying existing documentation together.
- Start supporting new data collection from add-ons in Telemetry, starting with events.
How to contact us.
Please reach out to us with any questions or concerns.
- You can find us on IRC in #telemetry and #datapipeline.
- The main mailing list for data topics is fhr-dev.
- Bugs can be filed in one of these components.
- You can also find us on Twitter as @MozTelemetry.