Unleashing the Power of dbt: Advanced Techniques and Features

Rasiksuhail
6 min readApr 30, 2023
dbt

dbt (Data Build Tool) is an open-source command-line tool for modern data transformation and engineering. It allows data teams to transform, test, and deploy data models using version control and the principles of software engineering. With dbt, analysts and data engineers can collaborate on building data pipelines, modeling complex data structures, and ensuring data quality. dbt is language-agnostic and can connect to any data source. It’s gaining popularity among data professionals as a preferred tool for data transformation and modeling.

Before starting this, go through other articles of dbt(attached the links at the end)

Today, we are going to learn topics related to advanced dbt.

Advanced dbt is an extension of the basic features of dbt, designed for more complex use cases and customized solutions. It includes advanced topics such as custom plugins, data modeling, data governance, and deployment management. Advanced dbt requires a deep understanding of SQL, data modeling principles, and the ability to write and maintain custom code.

We will cover basics of the following topics
1. Using the dbt Cloud API to programmatically manage deployments
2. Writing custom dbt plugins to extend functionality
3. Implementing row-level security (RLS) in dbt

Lets start.

Using the dbt Cloud API to programmatically manage deployments

Using the dbt Cloud API to programmatically manage deployments and other tasks can greatly streamline your workflow and increase efficiency. Instead of manually triggering deployments or performing other routine tasks, you can automate them using the API. This can save you time and reduce the risk of human error.

To get started, you will need to obtain an API key from dbt Cloud. Once you have the API key, you can use it to make requests to the dbt Cloud API. Some common tasks that can be automated using the API include triggering deployments, running tests, and managing resources.

Here are two code snippets to help you get started with the dbt Cloud API:

  1. To trigger a deployment:
import requests
import json
api_key = 'your_api_key'
deployment_id = 'deployment_id'
headers = {
'Authorization': f'Token {api_key}',
'Content-Type': 'application/json'
}
url = f'https://cloud.getdbt.com/api/v2/deploys/{deployment_id}/start/'
response = requests.post(url, headers=headers)
print(response.json())

2. To run a test:

import requests
import json

api_key = 'your_api_key'
job_id = 'job_id'
headers = {
'Authorization': f'Token {api_key}',
'Content-Type': 'application/json'
}
url = f'https://cloud.getdbt.com/api/v2/jobs/{job_id}/run/'
response = requests.post(url, headers=headers)
print(response.json())

Using these code snippets, you can automate deployments and testing in dbt Cloud, saving you time and effort.

Writing custom dbt plugins to extend functionality

Custom dbt plugins can be extremely useful in extending the functionality of dbt and making it more tailored to your specific needs. Plugins can be used to define custom macros, hooks, adapters, or even custom node types.

To create a custom plugin, you’ll first need to create a new Python package. Within that package, you’ll define your custom plugin logic. Let’s take a look at a few examples:

  1. Creating a custom macro:
# custom_plugin/macros/custom_macro.py
from dbt.exceptions import CompilationException
def my_custom_macro(arg1, arg2):
if arg1 != arg2:
raise CompilationException('arg1 and arg2 must be equal')
return f"({arg1}, {arg2})"

In this example, we define a custom macro called my_custom_macro that takes two arguments and returns a formatted string. If the two arguments are not equal, a CompilationException is raised.

2. Creating a custom adapter:

# custom_plugin/adapters/custom_adapter.py
from dbt.adapters.base import AdapterPlugin
from dbt.adapters.sql import SQLAdapter
from dbt.adapters.postgres import PostgresConnectionManager
class CustomAdapterPlugin(AdapterPlugin):
def adapter(self):
return SQLAdapter(connection_manager=PostgresConnectionManager())

In this example, we define a custom adapter called CustomAdapterPlugin. This adapter inherits from AdapterPlugin and provides a new implementation of the adapter() method. This method returns an instance of SQLAdapter that uses a PostgresConnectionManager.

Once you’ve defined your custom plugin, you’ll need to configure dbt to use it. This can be done in your dbt_project.yml file by adding your plugin to the plugins section:

# dbt_project.yml
...
plugins:
- my_project.custom_plugin

This tells dbt to use your custom plugin whenever it is run.

Custom dbt plugins are incredibly powerful and can be used to extend dbt in many ways. With a bit of Python knowledge, you can create custom macros, adapters, and more to make dbt work exactly the way you need it to.

Implementing row-level security (RLS) in dbt

Row-level security (RLS) is a security feature that restricts access to certain rows of data based on the user who is accessing it. In dbt, you can implement RLS to control access to sensitive data.

To implement RLS in dbt, you first need to define a policy in your database. This policy defines the conditions under which a user can access a row of data. Here’s an example of a policy that restricts access to a “sensitive_data” table:

CREATE POLICY sensitive_data_policy ON sensitive_data
USING (user_id = current_user_id());

This policy only allows a user to access rows in the “sensitive_data” table if their user_id matches the current user’s user_id.

Once you have defined your policy, you can use it in your dbt models. To do this, you can add a pre-hook to your model that sets the search_path to include the schema that contains your policy. Here's an example:

models:
- name: sensitive_data
pre-hook: |
ALTER ROLE "{{ env_var('DBT_USER') }}" SET search_path = schema_with_policy,public;
columns:
- name: user_id
description: The ID of the user who owns this data.
tests:
- unique

In this example, the pre-hook sets the search_path to include the schema that contains the policy. This ensures that the policy is enforced when the model is compiled.

You can also use the post-hook functionality in your models. The post-hook is a SQL statement that is executed after the model is built, but before it is materialized. Here's an example of how to implement RLS in dbt using the post-hook:

{{ config(materialized='view', post-hook=rls_filter_sql) }}
select *
from my_table
where sensitive_data = true

In this example, the config block specifies that the model should be materialized as a view, and that the post-hook should execute the rls_filter_sql statement. This statement should be a SQL query that returns a boolean value indicating whether the user should have access to the row or not. Here's an example of what rls_filter_sql might look like:

{% set user_email = execute('select current_user_email() as email') %}
select *
from my_table
where sensitive_data = true
and user_email = '{{ user_email }}'

In this example, current_user_email() is a custom function that returns the email address of the currently logged-in user. The user_email variable is then used in the where clause of the query to filter the rows based on the email address of the user.

By implementing RLS in dbt, you can ensure that sensitive data is only accessible to users who are authorized to view it, while still allowing multiple teams or users to access the same database.

Cool. we have learnt basics of few advanced dbt techniques as part of dbt series.

To unlock the full potential of dbt, users must be willing to explore advanced topics such as using the dbt Cloud API, writing custom plugins, and implementing row-level security. With these advanced skills, users can take their data transformation processes to the next level and truly unleash the power of their data.

Start learning, folks

Explore my other articles on dbt to get started with

Thank you for reading my article. I post about Data , AI , Startups , Leadership, Writing & Culture.

Stay Tuned for my next blog !!

Have fun Learning :-)

--

--