Firefox data platform update

Georg Fritzsche
Georg Fritzsche
Published in
2 min readDec 23, 2016

The data platform team is working on our core Telemetry system, the data pipeline and providing core datasets, with support from the Firefox data engineering and the Data tools team.

To make new features more visible, we intend to provide quarterly updates, starting with this one.

What’s new in the last few months?

On the data collection side, we added scalars and added engagement measures on top. We now support recording histograms in child processes and added categorical histograms.

We improved documentation, starting from the Telemetry wiki page and updated onboarding. Further work happened on:
- improved client-side documentation
- an introduction on analyzing telemetry data
- a guide on choosing a dataset for analysis
- updates to examples for the longitudinal dataset
- exploring how to make Firefox data more discoverable

The data pipeline work powers results for re:dash and custom analysis among other things. Notable recent work here includes:
- the cross-sectional dataset is operational in re:dash
- socorro crash data is now available in re:dash
- the new dataset API improves querying raw ping data
- the self-serve analysis portal was relaunched with improved UX
- provided easy access to self-serve real-time analysis
- launched a knowledge repository to make reports discoverable in one place and easier to review

Coming soon

For the next few months, interesting projects in the pipeline include:
- event telemetry, which enables recording event data into Telemetry in a common format
- work to decrease data latency, allowing us to make decisions faster
- adding Telemetry support for Add-ons
- making it easy for new pings to show up in re:dash thanks to direct-to-parquet
- enabling efficient lookup of client histories using Hbase
- improved alerting for Telemetry probes
- creating a new crash summary dataset, to make it easier to analyze crash data

Contact us

Please reach out to us with any questions or concerns.

You can find us on IRC in #telemetry and #datapipeline.
The main mailing list for data topics is fhr-dev.
Bugs can be filed in one of these components.
You can also find us on Twitter as @MozTelemetry.

--

--