Stemma: The Leading Modern Data Catalog, Made Easy

Bogomil Balkansky
Sequoia Capital Publication
3 min readJun 2, 2021

We at Sequoia are proud to announce a new seed-stage partnership with Stemma, the fully managed data catalog powered by Amundsen.

In the first inning of the digital era, software, famously, ate the world. In the second, data is quickly becoming the lifeblood of every company because it is used to support a wide range of human decision-making. Historically, data was processed primarily in batch mode, via a data stack with three components: ETL to extract, transform and load data; a warehouse to store it; and a business intelligence tool to visualize it. But with the advent of real-time data processing and machine learning, data is increasingly powering real-time applications and customer facing experiences.

Tristan Handy, founder and CEO of Sequoia portfolio company Fishtown Analytics (best known for their flagship product, the data-transformation tool DBT) describes this phenomenon as a “Cambrian explosion” of innovation in the modern data stack. While the modern data stack is still very much anchored by the data warehouses such as Snowflake, data lakes and query-processing engines are now part of the equation. DBT has made data transformations a standalone discipline. And alongside ETL, reverse ETL offerings such as Census have emerged, and so have data-quality observability platforms like Bigeye.

These new data technologies not only facilitate the movement and processing of vast amounts of data, but also whet organizations’ appetite for more data. So the volume of data continues to grow, and so, too, does the number of people who work with it, creating an ever-increasing diversity of data-driven roles. Navigating this cornucopia becomes more and more difficult, and analysts and data scientists begin spending most of their time simply looking for the right data source — or repeatedly redoing analyses because they used the wrong one. Such daily roadblocks lead to lost efficiency, and to incorrect conclusions that undermine trust in company decision-making. When a BI dashboard reveals a change, the immediate response is often to question the validity of the data rather than accept the conclusion. To quote Tristan again: more data = more chaos = less trust.

Enter the modern data catalog.

Some of the most forward-thinking, data-driven companies in Silicon Valley encountered these problems, and built their own internal data catalogs. That’s how Amundsen was born at Lyft. Where earlier versions had catered to an organization’s “data steward,” whose full-time job was to be a custodian of assets. The modern platforms recognized the radical shift toward democratization of data and responded accordingly. They serve not just the people who have “data” or “analysis” in their titles, but many other roles — from the product manager trying to understand customer behavior, to the growth marketer trying to optimize campaign spend, to the finance person preparing the P&L, to the customer success manager analyzing the health of accounts. A modern data catalog facilitates productivity by helping people in all data-driven roles find the right information.

As we at Sequoia surveyed the landscape of modern data catalogs, it quickly became apparent that Amundsen had emerged as the market leader in the category. The Amundsen team led by Mark Grover had best internalized that what matters most is the speed and ease with which people in data-driven roles can access what they need. Under the hood, they built the right architecture to harvest and process metadata from thousands of data sources. But equally important, they built an elegant user experience.

Before it was open-sourced, Amundsen was an immensely successful internal product at Lyft, with 750 weekly active users. Today, organizations including Square, ING Bank, Workday, Asana and Instacart are early adopters, and Amundsen boasts both the largest open-source community and the largest ecosystem of data source integrations in the space.

And now, Mark and co-founder Dorian Johnson are launching Stemma, a fully managed data catalog powered by Amundsen. Stemma provides enterprise capabilities — such as richer automated metadata through intelligence based on common usage patterns — that will enable many more organizations to harness the power and flexibility of the industry-leading platform on which it’s based. Mark, Dorian and their team are building one of the critical pieces of the modern data stack, and we are thrilled to support them on their journey.

--

--

Bogomil Balkansky
Sequoia Capital Publication

Partner at @Sequoia investing in enterprise software. 20+ yrs product and marketing leadership @VMware, @GoogleCloud. Diver, cook, photographer.