Bigeye: Must-Have Quality Monitoring for the New Data Stack

Bogomil Balkansky
Sequoia Capital Publication
3 min readApr 15, 2021

As a partner at Sequoia, asking questions is a big part of my job. But as a former product manager, I always had plenty of questions for my friends and colleagues even before I became a VC — and one of my favorites has long been, “What is most difficult in our world?”

When I posed that question during a conversation with Jennifer Anderson, who I’d worked with at VMware and bebop and who at the time was leading the data platform team at Uber, her immediate response was “data quality.” So when I joined Sequoia last year, data quality monitoring became one of the first topics to fascinate me.

At the same time, I was exploring the future of observability — the software engineering variety — another friend’s answer to my “What is most difficult?” question. As I researched, I recognized a clear parallel, and it became intuitive to me that data quality monitoring will be as important to data as observability has been for software systems. But I also noticed one important difference: software code does not mutate on its own after it’s pushed to production, whereas data is dynamic — it changes and evolves all the time. To me, this made data quality monitoring an even more complex and interesting challenge.

Following that thread quickly led me to Kyle Kirwan and Egor Gryaznov, who had worked on Jennifer’s team at Uber, and their company Bigeye (then Toro). It was clear to me from our first meeting that I wanted to be part of their journey, and they quickly became my go-to advisors on where the world of data was going. Not only can Kyle and Egor geek out at any depth about data quality metrics and anomaly detection, but they also have a unique clarity of vision about the data infrastructure space. Many VCs maintain lists of companies they’re interested in — and Bigeye has been at the top of mine for the last year.

For us at Sequoia, this new partnership with Bigeye is also part of our broader thesis about the modern data stack. While there is a bewildering array of data infrastructure technologies, we are seeing a repeatable stack emerge in data-driven organizations, centered on a cloud data warehouse like Snowflake. We are seeing a rich ecosystem emerge around the cloud data warehouse: ELT (extract, load and transform) technologies help move data out of operational systems while DBT transforms it; data scheduling and workflow engines like Dagster help create and manage a chain of data operations; data catalogs help navigate what data to use for what purpose, keeping up with growth in the volume and complexity of both data itself and the people and systems that consume it. Finally, data is made available for analytical suites, or served back to the operational systems by reverse ELT platforms like Census. And at every step in the chain — as data travels from one place to another, and is joined, aggregated and transformed — Bigeye can ensure that it stays in the shape that users and systems expect.

It’s no surprise that customers, who have benefited from Kyle and Egor’s foresight and rich experience, love Bigeye, praising everything from the fast time to value (the platform can identify an anomaly within 15 minutes of setup), to the more than 50 out-of-the-box data quality metrics, to its easy integration with their own existing infrastructure as well as a rich set of data sources, including Snowflake, Redshift, BigQuery, Presto, MySQL, Postgres, SQL Server, SAP Hana and Databricks.

On behalf of everyone at Sequoia, we are thrilled to partner with this exceptional team that is building the best data-quality product on the market, and we’re excited to support them as they expand the platform — and grow the company — in the months and years to come.

--

--

Bogomil Balkansky
Sequoia Capital Publication

Partner at @Sequoia investing in enterprise software. 20+ yrs product and marketing leadership @VMware, @GoogleCloud. Diver, cook, photographer.