TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

A brief history of the metrics store

Nick Handel
TDS Archive
Published in
8 min readDec 15, 2021

--

The term “metrics store” is creating some buzz in the data community. And “buzzy” things are usually new right? Actually, much like business intelligence (BI), the concept of a metrics store has been around longer than you think.

Large technology companies like Airbnb, Uber, and LinkedIn are some examples of companies that saw value in a metrics layer before it was cool. I’ve even talked about my experience at Airbnb in a past Medium post. These companies found that in order to understand their business, conduct experiments, and share insights, they needed a centralized location for metrics definitions, governance, and context. Data and analytics projects only work when people trust the data and everyone is aligned on how you’re reporting numbers across the company, within small teams all the way up to the C-level.

In this article, we’ll discuss the origins of the metrics store concept, how companies have approached this technology in the past, and the future of the metrics store. Now, let’s take a journey to the past and the future (*cue the DeLorean*).

Anyone seen Doc Brown? We’re taking it all the way back to the year 2000. ‌ ‌Photo by Joel Muniz on Unsplash

Origins of the metrics store: A timeline of events

The idea for a metrics store came from a common problem, messy data. You’ve heard it time and time again, but as the demand for more data grows, so does the complexity.

It doesn’t take long before your metric logic is scattered all over the place — a data analyst’s worst nightmare.

This image shows how metric logic can be easily scattered across data analytics tools. Photo by author.

So how did we get here?

A high-level timeline of events leading up to the creation of the metrics store. Photo by author.

The early days of data modeling (1996 - 2010)

Before self-service BI, companies relied heavily on people and processes to manage datasets for consumption. The tools available to most BI engineers were built around cron jobs and SQL statements, which made it challenging to orchestrate data pipelines. It was also expensive to store large quantities of data, so folks had to be careful about what they extracted, run careful transformations, and then load small datasets for consumption. This took lots of care to manage correctly so that data cubes could be useful for driving business decisions. Orchestrating a procession of manual Excel sheet edits for manual data entry was another challenge faced by the early BI practitioners and IT teams were called in to help build technical steps and processes to keep things in sync.

Before self-service BI, it was expensive to store large quantities of data, so teams had to be careful about what they extracted, run careful transformations, and then load small datasets for consumption.

Self-service BI tools emerge (early 2010s)

After 2010, interactive dashboards took the industry by storm, making the way for more collaboration between data teams and business teams. The idea of “dashboards-as-a-service” emerged as interactive reports became the primary output of data analyst teams to help business-stakeholders consume their own data. But this also led to complications, since analysts had limited time and couldn’t work through the “data breadlines” as fast as new requests came in. Worse, most companies had more than one BI tool because end-users preferred different interfaces requiring duplication of logic.

After 2010, interactive dashboards took the industry by storm, making the way for more collaboration between data teams and business teams.

This is when tech companies like Airbnb started conceptualizing the idea of a metrics repository for reporting and prepping metrics for analysis.

Advancements in data platforms (2015 - 2020)

Around 2015, the concept of an OLAP Cube grew out-of-fashion as people struggled to maintain these nice clean slice-and-dice interfaces for their BI tools. Instead, data platforms aimed to ingest, process, analyze, and present, bringing disparate data into one place to make it easier to manage directly from the data warehouse.

Around 2015, the concept of an OLAP Cube grew out-of-fashion as people struggled to maintain these nice clean slice-and-dice interfaces for their BI tools.

But as more companies invested in data, the tools grew more advanced and in some cases specialized. Not all of them could live within the monolithic data platforms. Each step of the data pipeline became more streamlined and specialized and tools got more fragmented. This made data governance complicated and unruly.

Specialization leads to new data applications (2017 - today)

In recent years, this specialization has gone even further, leading to novel tools that enable new applications. This led to a requirement for governance and new challenges around visibility, lineage, and the operational health of the business. Inconsistent data sets left analysts wading through hundreds of lines of SQL to make sure the answers they were providing were accurate.

Today, teams extract data from their data warehouse(s) or data lake and bring that data into a variety of tools including multiple business intelligence and experimentation platforms.

In recent years, we’ve streamlined each step of the pipeline and as a result, we now have more tools than ever. This leads to two big challenges: varied data modeling practices and governance challenges.

The Metrics Store (2020 - future)

Now, metrics stores have emerged as their own SaaS offering, ultimately bringing back the governance of an OLAP cube, while reducing data duplication and logic duplication and enabling new data applications.

The metrics store (or metrics layer) sits in between an organization’s data warehouse and other downstream tools.

This means that all organizations will now have access to the sophisticated technology that was once reserved for the large tech companies that could muster the resources to enable these new data applications with in-house tools. Let’s talk about those tech companies that have successfully created internal metrics tools and what they’ve learned.

Gaining traction: How tech companies built internal metrics stores

Since a metrics store like Transform wasn’t commercially available until recently, organizations solved the issue of disparate metrics definitions with in-house tools. This is a complex problem to solve and multiple companies that attempted to build metrics platforms failed in the process. Unfortunately, there aren’t many blog posts about these experiences, but the learnings from these experiences are out there for those who are interested.

The organizations who successfully built these tools are typically large tech companies with large, innovative in-house data teams. They experienced the challenges of governance and growing variety and consumption points of data before the broader industry and had the resources to pursue in-house tooling to solve these problems.

Airbnb: Minerva metric platform

Airbnb’s metric platform, Minerva is a popular example of an in-house metrics store. Airbnb’s metrics tooling goes back to 2014, when the company set out to scale A/B testing. The company had invested heavily in high-quality data modeling for the most critical data tables called `core_data` but analysts still spent too much time pulling together datasets for analysis and often struggled to report the same numbers. To solve this, they created Minerva, which “takes fact and dimension tables as inputs, performs data denormalization, and serves the aggregated data to downstream applications.”¹

Uber: uMetric to tackle metric discrepancies

Uber understands that metrics play a critical role in their decision making. One of Uber’s well-known metrics is Driver Acceptance Rate, which they express as “accepted requests by DriversTotal offers to Drivers,” which is key to their customer and driver experience. But determining these types of metrics was just one piece of the puzzle. Because of an increase in data democratization, Uber’s data team also saw a common challenge — teams all had their own data pipelines and consumption tools, leading to disparate metric logic and values. Uber created uMetric, with a goal to “build engineering solutions to tackle the discrepancies in business-critical metrics.”²

LinkedIn: Unified Metrics Platform for a single source of truth

LinkedIn wrote an article on their engineering blog about their Unified Metrics Platform which “serves as the single source of truth for all business metrics at LinkedIn by providing a centralized metrics processing pipeline (as-a-service), a metrics computation template, a set of tools and process to facilitate metrics lifecycle in a consistent, trustful, and streamlined way.”³

Spotify: Scaling experimentation and analysis

These companies aren’t alone. Spotify also announced their new experimentation platform that includes a metrics catalog to run “SQL pipelines to ingest metrics into a data warehouse, from where data can be served with sub-second latency to UIs and notebooks.”⁴

There is a lot to learn from these organizations who are striving to balance wider access to data with standardized metric logic across all of their tools.

The future: New technology is paving the way

Up until recently, if organizations wanted a centralized location for metrics, they had to build it on their own. This requires heavy infrastructure investments and sometimes years of work on behalf of engineering teams.

Now the metrics store is gaining traction as its own category in the modern data stack. This technology provides some key benefits:

  • Metrics become the language of data: You can build metric logic and support various data models all in one place. Metrics are already the language of the business, so why not use this as a model for how you interact with and surface insights?
  • Eliminate secondary sources of truth: Consolidate all of your metrics so that all of your metrics are consistent across all upstream and downstream tools.
  • Build a knowledge hub around metrics: Add context to your metrics so that the data team isn’t stuck answering the same questions over and over. All of the questions and the context is already ready and accessible for data teams and business users.

I see a future where metrics stores are available to everyone, regardless of your organization’s size or industry.

[1]: Amit Pahwa, Cristian Figueroa, Donghan Zhang, Haim Grosman, John Bodley, Jonathan Parks, Maggie Zhu, Philip Weiss, Robert Chang, Shao Xie, Sylvia Tomiyama, Xiaohui Sun. (June 1 2021). How Airbnb Standardized Metric Computation at Scale
https://medium.com/airbnb-engineering/airbnb-metric-computation-with-minerva-part-2-9afe6695b486

[2]: Xiaodong Wang, Wenrui Meng, Will Yu, and Yun Wu. (January 12 2021). The Journey Towards Metric Standardization
https://eng.uber.com/umetric/

[3]: LinkedIn Engineering. Unified Metrics Platform (UMP).
https://engineering.linkedin.com/unified-metrics-platform

[4]: Johan Rydberg. (October 29 2020). Spotify’s New Experimentation Platform (Part 1)
https://engineering.atspotify.com/2020/10/29/spotifys-new-experimentation-platform-part-1/

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Nick Handel
Nick Handel

Written by Nick Handel

Nick is the CEO and Co-Founder at Transform (www.transform.co). Before Transform, he held lead Data roles at Branch, Airbnb and BlackRock. @nick_handel