Operational machine learning for the modern data stack

Published in

Innovation Endeavors

5 min readJun 8, 2022

Our investment in Continual — By: Harpinder Singh and Davis Treybig

Tyler Kohn and Tristan Zajonc, the founders of Continual

It is hard to quantify the magnitude of impact that machine learning has had on the world over the past decade. Domains like translation, image processing, search, natural language understanding, robotics, and signal processing have been completely transformed as a result of modern ML techniques. Yet, in many respects, the adoption of machine learning is still extremely early. Despite immense investment in machine learning tooling and research over the past decade, the number of companies that are building and deploying operational machine learning systems today is extremely small.

This duality is due to the significant complexity that still exists in building and productionizing machine learning systems. Data preparation, data pipelines, feature engineering, infrastructure management, and the general software engineering workflow for stitching this all together (versioning, model management, A/B testing, monitoring, retraining) come together to form an almost insurmountable barrier for most teams. This has, for the most part, resulted in a market of “haves” and “have nots”. While tech giants like Doordash, Netflix, and Uber can hire the rare machine learning engineers capable of doing all of this to build systems like the one shown below, most companies are left incapable of moving beyond offline models in Jupyter notebooks.

*Doordash’s ML Platform* *is powerful, but an extremely complex system*

Tristan Zajonc felt this quite viscerally as the CTO of Cloudera’s machine learning division. He would work with countless companies that saw immense potential value from machine learning but who also felt paralyzed by the perceived complexity of systems like this. Indeed, even the simple use cases where off-the-shelf models can easily get you very far — revenue forecasting, churn prediction, demand planning, and similar — became bogged down in complexity as customers attempted to move them toward production.

Yet, at the same time, Tristan saw a foundational shift happening in the data space. Cloud data warehouses were a new substrate which massively reduced infrastructure complexity, and a new breed of tools like dbt were leveraging them to completely transform the field of data engineering. What used to require a team of engineers experienced in distributed systems and Spark could now be done by someone who just knew SQL. The software engineering rigor of the previous generation of tooling had been preserved, but accessibility was massively increased. As a result, data analysts had suddenly become massively upleveled in terms of their impact; they could now build production-grade data pipelines, and the number of companies who could actually do large-scale data modeling work was massively increased.

Tristan realized that there was no reason why you couldn’t leverage the same trends and principles to similarly democratize access to operational machine learning systems. And so, he partnered up with his longtime friend Tyler Kohn, the former CTO of RichRelevance, and they decided to start Continual. At the simplest level, Continual is the AI layer for the modern data stack. It sits on top of your data warehouse and allows data analysts who know SQL to train, deploy, and productionize machine learning models which continuously retrain and re-run inference, all without you needing to manage a single piece of infrastructure. The product is ridiculously simple, yet it has enough functional richness around testing, version control, git-like workflows, model management, feature storage, and similar that it can power truly operational models — a balance that many of the previous generations of AutoML tools have failed to achieve.

If you’re used to dbt, Continual’s core workflow will immediately resonate with you. Many of Continual’s early customers, like Veronica Beard, Enverus, and Aurora Solar, are dbt users who are trying to further uplevel themselves, moving beyond descriptive analytics to predictive systems that power business decisions. Continual gives these analytics teams machine learning superpowers allowing them to infuse predictive analytics across sales, marketing, customer success, and operations.

But, the story doesn’t stop there. Continual has built a product that radically simplifies the end-to-end user experience for productionizing machine learning models by cleanly separating the declarative business layer from the operational MLOps layer. This has just as much utility to data science teams as analytics engineers. Few data scientists want to deal with Kubernetes or stitch together 5+ MLOps tools for each machine learning use case. Indeed, even very large companies with extremely talented ML engineering teams such as Apple and Uber have built similar declarative ML systems for their teams internally to empower all users to build models faster and more reliably. We believe that Continual has the opportunity to become a unified substrate for MLOps in many organizations, both for the SQL analyst solving a simpler use case with an off-the-shelf model and for the data scientist who wants to bring their own model or empower their entire organization with new standardized ML capabilities..

As such, we couldn’t be more excited to announce our lead investment in Continual’s Series A round. Tristan and Tyler have rich backgrounds in data engineering & machine learning, and we couldn’t imagine a better duo to build this company. Joining us in the round are a number of notable players in the modern data stack, including Tristan Handy, the CEO of dbt labs; Allison Pickens, board member at dbt labs; Tomer Shiran, Founder of Dremio; Amplify Ventures; and DataCouncil. This investment represents a continuation of our focus on data warehouse-native companies, such as Panther, Revsure, and Eppo, and more broadly highlights our continued belief in the immense impact machine learning will have on the world.

Operational machine learning for the modern data stack

Written by Davis Treybig