Lightning-Fast dbt Model Deployment

Published in

Shipyard

5 min readJan 5, 2021

Data Build Tool, better known as dbt, is slowly sweeping the data world off its feet. As many organizations shift their data operations from ETL to ELT, dbt makes it easier for analysts that work with SQL every day to define and document data models for the organization. It’s becoming a staple technology that makes it easy to transform massive amounts of raw data into usable tables and views for the whole organization.

While dbt is easy to start using, we’ve found that for many teams, getting dbt set up and running effectively can be a cumbersome process. Running models tied to a team member’s local laptop isn’t sustainable. Configuration on custom servers often requires DevOps knowledge to ensure a sustainable, error-free setup. Executing with existing workflow tools requires you to learn new proprietary setups and workarounds to get dbt working.

These complexities make it clear why dbt themselves offer their dbt Cloud service just for running dbt. However, most data teams need dbt to be interconnected with the rest of their data operations processes. Each set of dbt models rely on specific data sources being loaded effectively and each table or view powers reporting, ML models, and last-mile actions. We wanted to create an easier way to launch dbt in the cloud and connect it to your entire data stack.

Introducing the dbt Blueprint

Today, we’re excited to tackle this problem head-on with the launch of our new dbt Blueprint. This Blueprint will streamline the steps required to get dbt up and running in the cloud, allowing data teams to deploy their latest dbt models to production rapidly with an incredible amount of control and visibility over their setup. Team members can execute dbt code by simply providing the command to run. There’s no need to fiddle with infrastructure or touch the underlying code.

Deploy to Production 10x Faster

For the new dbt Blueprint, you’ll need to sync your dbt repository to Shipyard, using our Github Code Sync integration, to start automating all of your data models in the cloud in minutes. Once synced, the dbt Blueprint will allow your team to run any dbt CLI command against up-to-date code living on Github. From there, it’s just a matter of scheduling your dbt run and dbt test commands to run independently or as a part of your larger data workflows.

For in-depth instructions, you can follow this guide on deploying dbt with Shipyard here.

Optimize your CI/CD Flow

Due to our unique integration with Github, the dbt Blueprint enables your team to continue building and updating data models with their existing git flow, all while letting Shipyard handle the constant execution of their work in the background.

This model means that your version control continues to live within Github while Shipyard helps you better keep track of how your dbt code is being used and how it’s connected to the larger picture of your data operations. If you want to know what version of the code ran at any moment in time, Shipyard will show the commit hash alongside runtime metadata and dbt’s logged output.

Connect All Of Your Data Tools

With the introduction of the dbt Blueprint, you can quickly connect the execution of dbt to any other script that you write in Bash or Python. Additionally, you can connect it to other common processes that need to be run against external data tools (Snowflake, Redshift, Bigquery, etc.) using our Blueprint Library.

Shipyard is designed to automate and connect ANY code and packages that you might be using — not just dbt. With the Shipyard platform, your has a greater flexibility to create a pipeline where each step shares data and talks to each other, rather than relying on flimsy timing-based pipelines between siloed systems.

Here’s a few examples of how you can connect dbt to other services to create a seamless solution for your Data Team.

Run a Fleet that kicks off data loading jobs with Fivetran and immediately starts running your dbt projects upon their completion.
Tie your action-based scripts (reporting, ML models, API updates, etc.) to the status of individual dbt models. Develop a Fleet with conditional paths where downstream Vessels won’t execute unless the upstream dbt models are run successfully.
Run a Fleet that sends custom emails or slack messages to vendors when dbt returns data issues.
Run a Fleet that executes all of your dbt models and stores the logs externally.

Scale your dbt Usage

Your team can use the same dbt Blueprint repeatedly, tracking each distinct usage across the organization. If you want to run different commands, create a new Vessel with the same Blueprint. If you want to use a slightly different version of the code, or different environment variables, duplicate the Blueprint and make adjustments.

This setup makes it easy to scale how effectively your team uses dbt.

Split out your dbt commands to run subsets of models in your projects simultaneously, each with its own multi-threading. With Shipyard’s dynamically scaling infrastructure, there’s no longer a need to run your entire project as a single operation.
Build out projects and workflows that are specific to subsets of your dbt model, empowering your team to better model end-to-end data touchpoints while eliminating errors downstream.
Run different dbt versions (QA, development, production, etc.) on the same infrastructure without any setup changes. Quickly test and compare how code updates affect the overall output.
Update all of your dbt Vessels to the latest tagged release with one click. Made a mistake? Rollback your changes in minutes.

The dbt Blueprint is now available to all subscribers and can be tested with any trial account. Shipyard is making it easier than ever to automate your dbt repository in the cloud. Sign up for a free 14-day trial to get started automating your dbt projects and follow our guide for deploying dbt in the Cloud.

We’re looking forward to seeing how users will take advantage of this new blueprint to implement dbt in production quickly and deploy data solutions across their modern data stack.

About Shipyard:
Shipyard is a serverless data workflow platform that helps Data Teams launch, monitor, and share their solutions 10x faster. Driven by a mission to simplify every company’s data operations, they are creating an ecosystem where organizations can break down data silos and move beyond dashboards towards a future of fully automated, data-driven actions. The founding team draws on their previous experience at top agencies and media companies, handling high-throughput digital advertising and inventory data for Fortune 500 companies. For more information, visit www.shipyardapp.com or get started with a 14-day free trial.

Originally published at https://www.shipyardapp.com on January 5, 2021.