Add Machine Learning to Your dbt Workflows with Continual

Jordan Volz
9 min readDec 16, 2021

--

Today we’re pleased to announce Continual Integration for dbt. We believe this is a radical simplification of the machine learning (ML) process for users of dbt and presents a well-defined path that bridges the gap between data analytics and data science. Read on to learn more about this integration and how you can get started.

What is Continual?

Continual is an automated operational AI platform built for the Modern Data Stack. It offers a streamlined process for deploying AI use cases into production for users of cloud data warehouses. Continual is easily accessible to all data users, enables quick prototyping and versioning of ML work, and provides a simple production workflow that follows software engineering best practices. Whether you are an ML expert or new to AI, Continual allows you to iterate quickly on your problems and operationalize your results without rigging up complicated data pipelines, getting lost in notebook diffs, or becoming a Kubernetes expert.

The modern data stack is rapidly democratizing data and analytics, but deploying AI at scale into business operations, products, or services remains a challenge for most companies. Continual provides a fresh take on this problem. Powered by a declarative approach to operational AI and end-to-end automation, Continual enables modern data and analytics teams to build continually improving machine learning models directly on their cloud data warehouse without complex engineering.

We believe that AI should be pervasive in all organizations and with the right tooling, i.e. Continual, this can be a reality for every organization.

What is dbt?

From the dbt website:

dbt is a transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. Now anyone who knows SQL can build production-grade data pipelines.

dbt has emerged as a lynchpin to the modern data stack, and it is easy to see why. Simply put:

  1. It tears down barriers in data analytics processes. It allows all data workers to speak the same language and utilize a workflow they can all agree with. It’s simple, yet effective, and whether you’re a data analyst or a software engineer, there is something to like for everyone and with which to find enough common ground to happily collaborate.
  2. It is built for the cloud. Whether utilizing dbt core or the spiffier dbt Cloud, it’s easy to adopt dbt into your cloud-native workflow and it imposes no maintenance burden on users.
  3. It is easily operational. In a part of the data world that was sorely lacking operational options prior to its arrival, dbt emerged as a breath of fresh air and has an opinionated take on how best to execute your data pipelines. The tool was originally pitched as “making data analysts feel like software engineers,” and we think they have delivered on that promise.

Why Continual Integration for dbt?

The field of data science is still relatively new but it’s been rapidly and continuously growing, largely independently from the analytics crowd. This creates several issues in a company’s data organization:

  1. Data analysts and data scientists speak different languages and use different tools. This creates unnecessary complexity in data workflows and builds barriers between different parts of the organization, which prevent fluidity in processes and hinder collaboration.
  2. These teams now likely spend a non-trivial amount of time duplicating the same work. Even if both teams are good at being efficient within their own team, the barriers to collaboration mean that insights inevitably fall through the cracks and people inevitably reinvent the wheel instead of working together as a single team.
  3. Operationalizing ML is difficult. Instead of following the best practices laid down by the analytics team, the data science team is more likely than not going to undertake the task of maintaining complex systems of data pipelines to try to productionalize their work. This is a brittle process that accumulates significant portions of technical debt and is generally at odds w/ the approach of the analytics team.

In conversations with organizations where data science teams do in fact use dbt, we’ve still noticed that there is often an awkward transition between dbt workflows and ML workflows. This creates unnecessary friction that we believe can be resolved with the right tooling. Continual integration for dbt bridges the gap between analytics and ML for data teams and establishes a common workflow across roles. Not only does this provide tight integration between data engineering/analytics and machine learning workflows, but it also means that users of all types can begin harnessing the power of ML in their own work.

Furthermore, Continual is co-designed with the modern data stack and is aligned with the core principles of dbt. We believe we synergize in several key areas:

  1. Both tools are declarative. The declarative nature of the dbt project is one of the key reasons for its success. Likewise, Continual is built on a declarative foundation, and, better yet, dbt users can directly integrate with Continual by just adding a few meta tags to their existing dbt projects. It couldn’t be simpler. (Details below!)
  2. Both tools are operational. Combining a declarative interface with a slick CLI means users can control their entire workflow with a few quick commands. This makes downstream integrations into CI/CD pipelines and similar a breeze. And, yes, these statements are true of both dbt and Continual.
  3. Both tools follow software engineering best practices. dbt users often tell me that the GitOps-friendly workflow is one of the key reasons they leverage the tool. Whether it’s versioning your transformations, easily switching profiles, or the ease of use of integrating dbt core or dbt Cloud into an existing CI/CD pipeline: there’s a lot to love (including for the compliance-minded). Similarly, Continual has the exact same features, and can even make use of your dbt profiles in constructing isolated environments in Continual.

With dbt and Continual, now anyone who knows SQL can build production-grade data machine learning pipelines!

“dbt was built on the idea that the unlock for data teams is a collaborative workflow that brings more people into the knowledge creation process. Continual brings this same viewpoint to machine learning, adding new capabilities to the analytics engineers’ tool belt. We’re excited to partner with Continual to help bring operational AI to the dbt community.”

— Nikhil Kothari, Head of Technology Partnerships at dbt Labs

How does it work?

Let’s get to the nuts and bolts of the integration. If you’re new to Continual you’ll need to complete a few simple steps to get started:

  1. Sign up for an account at https://cloud.continual.ai
  2. Create your first project.
  3. Install the Continual client and login to Continual.

We’re now ready to use Continual with dbt! Using Continual on a dbt project is easy. Just follow these steps:

  1. Annotate your dbt project with Continual configuration
  2. Execute dbt run [OPTIONS] to build your dbt project.
  3. Execute continual run [OPTIONS] to build your Continual project.

Configuration for the Continual integration must be defined in meta config files in dbt. meta fields can be defined in three places:

  1. The model block in the dbt_project.yml file.
  2. A model schema yml file in your /models directory (such as schema.yml)
  3. As configuration directly in your dbt data model.

For example, the following is an example of including your configuration in a schema.yml file:

models:
- name: customer_churn
description: “historic customer churn information”
meta:
continual:
type: “Model”
index: “ID”
target: “churn”

However, we could similarly define this in the customer_churn.sql file:

{{ config(
meta = {
continual: {
'type': 'Model',
'index': 'ID',
'target': 'churn',
}
})
}}SELECT …

In the above example, we’re telling Continual that the table created by dbt represents a predictive model. In this model, we’ll be using the column ‘churn’ as the target, and the column ‘ID’ as the index for the model. When this information is passed to Continual, it will be able to fetch the data from the data warehouse, run experiments and select a winning ML model, and generate predictions back in the data warehouse. Users can additionally provide more advanced configurations in the meta fields that will do everything from controlling how frequently models and predictions are refreshed and how the AutoML engine operates. Refer to our documentation for full details.

Users can execute Continual directly on top of a configured dbt project by using the Continual CLI. The command is as follows (Note: you’ll want to make sure you execute ‘dbt run’ prior to executing continual run!):

continual run [OPTIONS]

This is meant to mirror dbt run and indeed, many of the same parameters are supported. In particular, users can use common options such as :

  • —-target: Overrides the default target found in profiles.yml. Will be used by Continual to build an isolated environment in Continual.
  • —-profiles-dir: The directory containing your profiles.yml file.
  • --project-dir: The dbt project directory, i.e. the directory containing your dbt_project.yml file.
  • --profile: Overrides the default profile found in dbt_project.yml.

There are, of course, continual-specific commands as well. The following are most important:

  • --project: The continual project to use. Overrides the currently set project.
  • --continual-dir: The subdirectory in your dbt_project to save Continual yaml files. By default, this is set to whatever is the targets-path in the dbt_project.yml file.

Continual supports the use of isolated environments. These are conceptually similar to branches in git. For dbt users, Continual will automatically create and execute the workflow in a branch tied to your dbt profile configuration. In particular, the environment name in Continual will be the same as the profile name in dbt, and Continual will use the schema defined in the profile to build out your Continual resources (like predictions). If you don’t want to mix your dbt schemas with the predictions created by Continual, we recommend setting a new profile specifically for Continual. The use of environments is crucial to building out a coherent production process and makes it very simple to integrate Continual with your CI/CD workflows.

Advanced dbt integration

Something that we believe deserves special attention is that Continual had some advanced integration with dbt via exposures and sources. These can be enabled either project-wide or on a model-by-model basis with the configuration options ‘create_exposures’ and ‘create_sources’. The former will create an exposure file in your dbt project that contains all the dependencies for your ML model. When building your documentation, dbt will now include this as a resource in your project and you will be able to easily review the lineage of your models, as seen below (Note: you can always additionally review lineage in the Continual UI at any time):

It’s not uncommon for users to want to run some analysis on predictions after they are created. We’ve made this process easy by allowing you to tell Continual to build a source file in your dbt project. This makes it easy to quickly reference the tables created by Continual, as well as starting to include these resources in your dbt documentation and lineage. Even more, you can use ’create_stubs’ to create stub .sql files in your dbt project. These files act as skeletons that refer to the new source files which you can use to begin running analyses on your predictions.

Worked Example

To see a worked example of the dbt integration, please refer to our documentation.

Get Started Today

With Continual and dbt, users can start tackling AI use cases and scaling their workloads. dbt is a fabulous tool that has done wonders for the data analytics field and is often credited with turning data analysts into analytics engineers. Similarly, we believe Continual is another step in the process that turns analytic engineers into machine learning engineers. Don’t take our word for it, though. Sign up for a free trial today and experience the power of dbt and Continual!

(Note: This article was originally featured on the Continual blog.)

--

--

Jordan Volz

Jordan primarily writes about AI, ML, and technology. Sometimes with a humorous slant. Opinions here are his own.