Introducing Glean — Telemetry for humans

Georg Fritzsche
Georg Fritzsche
Published in
6 min readSep 5, 2019
Glean logo — subtitled “telemetry for humans”

When Firefox Preview shipped, it was also the official launch of Glean, our new mobile product analytics & telemetry solution true to Mozillas values. This post goes into how we got there and what it’s design principles are.

Background

In the last few years, Firefox development has become increasingly data-driven. Mozilla’s larger data engineering team builds & maintains most of the technical infrastructure that makes this possible; from the Firefox telemetry code to the Firefox data platform and hosting analysis tools. While data about our products is crucial, Mozilla has a rare approach to data collection, following our privacy principles. This includes requiring data review for every new piece of data collection to ensure we are upholding our principles — even when it makes our jobs harder.

One great success story for us is having the Firefox telemetry data described in machine-readable and clearly structured form. This encourages best practices like mandatory documentation, steering towards lean data practices and enables automatic data processing — from generating tables to powering tools like our measurement dashboard or the Firefox probe dictionary.

However, we also learned lessons about what didn’t work so well. While the data types we used were flexible, they were hard to interpret. For example, we use plain numbers to store counts, generic histograms to store multiple timespan measures and allow for custom JSON submissions for uncovered use-cases. The flexibility of these data types means it takes work to understand how to use them for different use-cases & leaves room for accidental error on the instrumentation side. Furthermore, it requires manual effort in interpreting & analysing these data points. We noticed that we could benefit from introducing higher-level data types that are closer to what we want to measure — like data types for “counters” and “timing distributions”.

What about our mobile telemetry?

Another factor was that our mobile product infrastructure that was not ideally integrated yet with the Firefox telemetry infrastructure above. Different products used different analytics solutions & different versions of our own mobile telemetry code, across Android & iOS. Also, our own mobile telemetry code did not describe its metrics in machine-readable form. This meant analysis was potentially different for each product & new instrumentations were higher effort. Integrating new products into the Firefox telemetry infrastructure meant substantial manual effort.

From reviewing the situation, one main question came up: What if we could provide one consistent telemetry SDK for our mobile products, bringing the benefits of our Firefox telemetry infrastructure but without the above mentioned drawbacks?

Introducing Glean

In 2018, we looked at how we could integrate Mozilla’s mobile products better. Putting together what we learned from our existing Firefox Telemetry system, feedback from various user interviews and what we found mattered for our mobile teams, we decided to reboot our telemetry and product analytics solution for mobile. We took input from a cross-functional set of people, data science, engineering, product management, QA and others to form a complete picture of what was required.

From that, we set out to build an end-to-end solution called Glean, consisting of different pieces:

  • Product-side tools — The data enters our system here through the Glean SDK, which is what products integrate and record data into. It provides mobile APIs and aims to hide away the complexities of reliable data collection.
  • Services — This is where the data is stored and made available for analysis, building on our Firefox data platform.
  • Data Tools — Here our users are able to look at the data, performing analysis and setting up dashboards. This goes from running SQL queries, visualizing core product analytics to data scientists digging deep into the raw data.

Our main goal was to support our typical mobile analytics & engineering use-cases efficiently, which came down to the following principles:

  • Basic product analytics are collected out-of-the-box in a standardized way. A baseline of analysis is important for all our mobile applications, from counting active users to retention and session times. This is supported out-of-the-box by our SDK and works consistently across mobile products that integrate it.
  • No custom code is required for adding new metrics to a product. To make our engineers more productive, the SDK keeps the amount of instrumentation code required for metrics as small as possible. Engineers only need to specify what they want to instrument, with which semantics and then record the data using the Glean SDK.
  • New metrics should be available for basic analysis without additional effort. Once a released product is enabled for Glean, getting access to newly added metrics shouldn’t require a time-consuming process. Instead they should show up automatically, both for end-to-end validation and basic analysis through SQL.

To make sure that what we build is true to Mozilla’s values, encourages best practices and is sustainable to work with, we added these principles:

  • Lean data practices are encouraged through SDK design choices. It’s easy to limit data collection to only what’s necessary and documentation can be generated easily, aiding both transparency & understanding for analysis.
  • Use of standardized data types & registering them in machine-readable files. By having collected data described in machine-readable files, our various data tools can read them and support metrics automatically, without manual work, including schema generation, etc.
  • Introduce high-level metric types, so APIs & data tools can better match the use-cases. To make the choice easier for which metric type to use, we introduced higher-level data types that offer clear and understandable semantics — for example, when you want to count something, you use the “counter” type. This also gives us opportunities to offer better tooling for the data, both on the client and for data tooling.
  • Basic semantics on how the data is collected are clearly defined by the library. To make it easier to understand the general semantics of our data, the Glean SDK will define and document when which kind of data will get sent. This makes data analysis easier through consistent semantics.

One crucial design choice here was to use higher-level metric types for the collected metrics, while not supporting free-form submissions. This choice allows us to focus the Glean end-to-end solution on clearly structured, well-understood & automatable data and enables us to scale analytics capabilities more efficiently for the whole organization.

Let’s count something

So how does this work out in practice? To have a more concrete example, let’s say we want to introduce a new metric to understand how many times new tabs are opened in a browser.

In Glean, this starts from declaring that metric in a YAML file. In this case we’ll add a new “counter” metric:

browser.usage:
tab_opened:
type: counter
description: Count how often a new tab is opened. …

Now from here, an API is automatically generated that the product code can use to record when something happens:

import org.mozilla.yourApplication.GleanMetrics.BrowserUsage

override fun tabOpened() {
BrowserUsage.tabOpened.add()

}

That’s it, everything else is handled internally by the SDK — from storing the data, packaging it up correctly and sending it out.

This new metric can then be unit-tested or verified in real-time, using a web interface to confirm the data is coming in. Once the product change is live, data starts coming in and shows up in standard data sets. From there it is available to query using SQL through Redash, our generic go-to data analysis tool. Other tools can also later integrate it, like the measurement dashboard or Amplitude.

Of course there is a set of other metric types available, including events, dates & times and other typical use cases.

Want to see how this looks in code? You can take a look at the Glean Android sample app, especially the metrics.yaml file and its main activity.

What’s next?

The first version of the Glean solution went live to support the launch of Firefox Preview, with an initial SDK support for Android applications & a priority set of data tools. iOS support for the SDK is already planned for 2019, as is improved & expanded integration with different analysis tools. We are also actively considering support for desktop platforms, to make Glean a true cross-platform analytics SDK.

If you’re interested in learning more, you can check out:

We’ll certainly expand on more technical details in future upcoming blog posts.

Special thanks

While this project took contributions from a lot of people, I especially want to call out Frank Bertsch (data engineering lead), Alessio Placitelli (Glean SDK lead) and Michael Droettboom (data engineer & SDK engineer). Without their substantial contributions to design & implementation, this project would not have been possible.

--

--