Understanding the Metrics Store

Joanna He
Kyligence
Published in
9 min readFeb 9, 2022
Photo by Austin Distel on Unsplash

The evolution of the data landscape

Over the past 20 years, the market landscape of data architecture has undergone tremendous changes, from the traditional on-premises BI/DW (Business Intelligence and Data Warehouse) architecture to the big data-based distributed architecture (Hadoop) that emerged around 2010. Later on, with the rise of cloud computing, the data landscape evolved to a cloud-native architecture. At present, the market mainly promotes emerging data architectures that are different from the previous two generations of architectures. As many refer to this emerging data architecture, the modern data stack usually revolves around cloud data warehouses (Snowflake, Amazon Redshift, and Google BigQuery) and Cloud Data Lake (Databricks), or their cousin Cloud Data Lakehouses.

In the first two generations of architectures, distributed big data architectures such as traditional data warehouses or Hadoop have played a critical role in proving that people can extract real value from massive data. Still, their overall technical complexity ultimately limits its adoption to a small group of enterprises.

The modern data stack

Today, the rise of modern data stack is the fundamental reason why data market size has been exploding over the past few years: Because of their ability to store large amounts of data cost-effectively, not require technical experts to maintain, and their consumption-based pricing (pay-as-you-go), data warehouses and data lake (data lakehouse) are fundamental needs for every company to become a data company.

The modern data stack has opened up an entire ecosystem around itself: new market segments have evolved that fit the need for a new data landscape paradigm and the need of a modern company, to name a few, reverse ETL, Metrics Store, and Data Catalog. (You can learn more from our 7 Must-Know Data Buzzwords in 2022 blog)

According to Matt, in his famously known “Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape” blog, puts it-

Because the cloud data warehouse solves the fundamental storage layer, data warehouses liberate companies to start focusing on high-value projects that appear higher in the hierarchy of data needs.

Among these trends, we found metrics store one of the most intriguing ones, which is the main topic of this blog today.

What is a metrics store?

A metrics store is, in the simplest words, a middle layer between upstream data warehouses/data sources and downstream business applications. It can be called the Metrics Platform, Headless BI, the Metrics Layer, or the Metrics Store — — they are ultimately the same thing.

Unlike traditional BI reporting, metrics store decouples metrics definition from the BI reporting and visualizations. And the teams who own the metrics can define their metrics once in the metrics store, forming that single source of truth, and be able to consistently reuse the metrics across BI, automation tools, business workflows, or even advanced analytics.

Benn Stancil from Mode, in his blog, “The missing piece of the modern data stack,” has a nice graph that clearly states the metrics reporting nowadays. Without a centralized metrics store, the metrics logic will be defined repetitively in different tools, causing metrics inconsistency and discrepancy.

Image source: Benn Stancil

Some investors and practitioners sense the trending market opportunities of the metrics store and have a piece about their perspective.

In the same article above, Benn Stancil considers the metrics layer (aka metrics store) the missing piece of the modern data stack.

As described by Ankur Goyal and Alana Anderson in their article, “Headless Business Intelligence”, a truly scalable “Headless BI” (aka metrics store), has a massive open opportunity.

Image source: basecase

Current problem statement of metrics reporting

In the past, metrics were usually defined in data warehouses or BI applications, but this is causing increasing pains for enterprises with growing data volume and complexity. The rise of metrics store is essentially an attempt to find solutions for these challenges enterprises are encountering:

  • Inconsistency of key metrics definition across business units causing discrepancy for decision-making: Different teams will get entirely different reporting numbers for very simple business questions. To make matters worse, no one knows exactly which number is correct.
  • Inability to reuse defined metrics in more business applications that go beyond just BI dashboards: for example, to reduce user churn, the product growth team hopes to timely obtain information about inactive users in the past 30 days adopt activation strategies, such as giving users a free renewal. Only defining and analyzing metrics in BI cannot meet such demand scenarios, which will involve feeding metrics to business applications such as CRM systems.
  • The difficulty for business users to define metrics with SQL: as Ankur Goyal and Alana Anderson in their article, “Headless Business Intelligence”, puts it -

Simple tasks like user sessionization, funnel analysis, and data deduplication often require 1,000+ line SQL queries which must be written by expert data engineers or generated programmatically.

The high complexity of data architecture and pipelines results in low efficiency of data analytics: Materializing metrics in the data warehouse layer is a commonly used current solution. The data warehouse supports defining metrics in views and then letting other tools query the views.

Many companies I’ve worked with are currently using views to solve last-mile queries. The problem with using views is that they can only be materialized for some query requirements. When requirements are numerous, the data engineering team needs to prepare a large number of views. As a result, the development and maintenance cost is extremely high; What’s worse, the data pipeline is complicated and error-prone.

How does a metrics store solve these problems?

Achieve metrics reusability

The concept advocated by the metrics store is that “the metrics can be defined once, and then reused anywhere.” That means metrics store can be used flexibly across BI visualizations, SaaS integrations, and an API, opening up tons of new use cases that were not previously possible with BI reporting.

Achieve metrics consistency

In the current solution, the tight coupling between the metrics layer and the BI system that consumes it restrains the value of metrics in more application scenarios. However, suppose the metrics layer and BI can be decoupled to create standalone metrics stores, the single source of truth. In that case, the metrics consistency can be achieved when all kinds of downstream systems consume the unified metrics.

Image source: Lori Lu

Achieve self-service metrics definition

Metrics is designed for business rather than data/engineering function. So the metrics store must enable EVERYONE to become a data analyst regardless of age, data literacy, and technical skills. To make that happen, the metrics store needs to provide an extremely intuitive user interface to allow non-technical business users to define and analyze their business metrics. We have seen this design pattern in one of our most successful customer cases, which you can read and learn their stories in the customer story section below.

Achieve metrics scalability

What the ideal metrics store solution achieves will no longer be serving only canned BI dashboards like the old days. Instead, it completes the foundation on which operational BI and exploratory data science both live. With operational, data science, business self-service use cases kicking in, users adding up, queries growing in exponential numbers, the metrics store needs a strong computation engine to back it up to achieve the scalability and concurrency of business metrics consuming.

Metrics store example

You might somehow hear about the idea of metrics store here and there, but the actual examples are rare. With some research, you can find very few examples in the market are from Airbnb, Uber, and LinkedIn.

That’s why I must write this article to introduce you to some “new” examples from Kyligene customers who had successfully implemented and operated an Enterprise Metrics Platform (yes, they call it metrics platform).

Several metrics platforms from our Kyligence’ customers were built around 2019, even before the existing discussion around the hype of metrics store;

Pandora — Self-service metrics platform designed for everyone

A top commercial bank successfully rolled out a self-service metrics platform — Pandora, to democratize data across the bank in December 2019

Image source: Lori Lu

Previously, as illustrated, it typically took 12 workdays to deliver a data product embedded with 50 metrics. Like the traditional dashboard delivery process, it has five phases: requirement clarification, data sourcing, pipeline implementation, dashboard creation, and UAT. The most frustrating part of the entire workflow is that IT engineers have to communicate back and forth to align various business units and get each data owner’s approval to access and collect data. In addition, they are inundated with tedious, one-off projects and repetitive work since the outdated BI architecture was not designed for metrics reuse.

Since Pandora went live, the end-to-end delivery time has been reduced to 5 workdays because 30 out of 50 metrics are already available in the repository and ready to ship, and the other 15 metrics could be derived from the existing ones by applying simple filtering or mathematical transformations. So BI engineers only need to focus on creating the 5 new metrics instead of implementing all the 50 metrics. The improved efficiency also comes from applying the concept of Universal Design — Designing for Everyone in Pandora.

Pandora has an extremely intuitive user interface that allows non-technical folks to drag and drop ready-to-use metrics to assemble dashboards while IT experts are engineering the 5 new metrics for them. Creating dashboards is now “delegated” to business end-users; as a result, IT departments are free to drive new values.

Bonus: The 15 derived metrics and 5 new metrics — 20 metrics in total will be added to the metrics repo as an asset, and other business users could reuse them out of the box in the future.

The facts about the Pandora metrics platform — Dec 2021

  • Total active metrics: 9700+
  • Total self-service dashboards created: 7k+
  • Daily unique visitor: 5k+
  • Daily query count: 40k+
  • Daily query performance: 90% less than 5s

Note: Pandora metrics platform blog posts series were originally written by Lori Lu, you are welcome to read her original blog series below.

Why do enterprises need a metrics store?

The metrics store solution brings significant value to the enterprise: with the completion of the data analysis, the BI report may end its lifecycle. In contrast, enterprises will stick with the metrics store that tightly integrates with their business workflow and be reused anywhere, generating more possibilities.

I am personally relentlessly bullish on the future of Metrics Store as we have witnessed the tremendous values that metrics store has brought to our customers. In the example above, our customers choose to build their metrics store in-house and leverage Kyligence as the metrics computation engine, and we are confident in the near future there will be more mature “off-the-shelf” metrics stores that serve the needs of every business.

That’s why we are very excited to announce the release of Kyligence Zen. Kyligence Zen is an Intelligent Metrics Platform to align business goals and key metrics. It is a platform to centrally manage metrics, ensure metrics consistency, align organizational business goals, and improve organizational productivity. It automates data pipelines from data lakes or data warehouses to its multidimensional OLAP database, to deliver metrics consistency, reusability, and data trust in a cost-effective way.

Interested in learning more about Kyligence Zen? Learn more here or request a trial of Kyligence Zen.

Reference

[1] Coco Li, 7 Must-Know Data Buzzwords in 2022 (2022), https://medium.com/kyligence/7-must-know-data-buzzwords-in-2022-9d3d977a43f4

[2] Matt Turck, John Wu, Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape (2021), https://mattturck.com/data2021/

[3] Ankur Goyal, Alana Anderson, Headless Business Intelligence (2021), https://basecase.vc/blog/headless-bi

[4] Benn Stancil, The missing piece of the modern data stack (2021), https://benn.substack.com/p/metrics-layer

[5] Amit Pahwa, Cristian Figueroa, Donghan Zhang, Haim Grosman, John Bodley, Jonathan Parks, Maggie Zhu, Philip Weiss, Robert Chang, Shao Xie, Sylvia Tomiyama, Xiaohui Sun, How Airbnb Achieved Metric Consistency at Scale (2021), https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70

[6] Unified Metrics Platform (UMP), https://engineering.linkedin.com/unified-metrics-platform

[7] Xiaodong Wang, Wenrui Meng, Will Yu, and Yun Wu, The Journey Towards Metric Standardization (2021), https://eng.uber.com/umetric/

[8] Lori Lu, BI Dashboards are Creating a Technical Debt Black Hole (2022), https://medium.com/kyligence/bi-dashboards-are-creating-a-technical-debt-black-hole-31be41ee96f

[9] Lori Lu, Enterprise Metric Platform in Action (2022), https://medium.com/kyligence/enterprise-metric-platform-in-action-1-6f6e6bb866f8

--

--

Joanna He
Kyligence

Open Source to Commercial Product Manager; Data Product Marketer