Integrating Azure Synapse with DBT

Sayedur Rahaman
5 min readDec 3, 2021

Data Build Tool (DBT) has gained a lot of respect in the last few years among Data Engineers. In plain English, It’s a data transformation tool to generate analytics. DBT supports a variety of Data warehouses. The core DBT has support for Postgres, Sqlserver, Snowflake, Redshift, and Bigquery. Recently, I explored DBT’s capability to connect and generate analytics with Azure Synapse.

Problem Statement

Integrating synapse with DBT includes 3 combined steps.
I. Setting up a local environment with compatible python, DBT, and adapter versions.
II. Creating a synapse workspace and dedicated pool to connect.
III. Connecting via synapse adapter.

Yah!! That’s no rocket science. But, finding an appropriate version combination for python, dbt, pip, and synapse can be time-consuming.

I) Setting up Local Environment.

Install python 3.9.6 on the local machine. If you have another version installed previously, uninstall that first. The latest version of python does not support a synapse adapter. Pip- a python package manager is included with the setup file. We can use PowerShell to install dependency with PIP. Run the following commands as Administrator:

1. Upgrade pip to latest: pip install --upgrade pip
2. Install and Upgrade DBT: pip install dbt, pip install --upgrade dbt
Check DBT version : dbt --version
3. Install dbt-synaps: pip install dbt-synaps
This adapter is community supported :
dbt-msft/dbt-synapse: dbt adapter for Azure Synapse (née Azure Data Warehouse) (github.com)

Pip : 21.3.1, Dbt : 0.21.0, Dbt-synapse: 0.21.0

In my journey, This combination of versions was pretty solid to work on.

4. There is a dependency to make sure. Install ODBC driver 13/17 in local machine. This adapter is used to connect the Azure SQL Server.
How to check the ODBC SQL Server driver version (Windows) | Microsoft Docs.

II. Creating an Azure Synapse workspace and dedicated pool to connect.

1. Create a resource group to keep things isolated. Add a synapse workspace to the group. By default, a datalake storage is added to the workspace. You can use or delete that later.

2. After deployment, navigate to the Synapse Workspace. In the manage option, we need to create a dedicated SQL pool(database). Select the minimum performance level to optimize the cost. The built-in serverless has a different purpose . That does not support the synapse adapter we’re gonna use.

3. In the security tab put username & password for sql server. We’re gonna use this credential in the synaps adapter profile.

III) Connecting via Synapse adapter.

1. As we installed DBT there is a folder created named .dbt in the user profile folder. This folder contains the profile.yml file. This file is used to connect the Data warehouse. It contains the connection string, username, password, etc. Kept outside the project scope for security. We’re gonna create a profile for Synapse here in the profile.yml.

Path : C:\Users\UserName\.dbt\profile.yml

2. Firstly, we need to create a DBT project via CLI using PowerShell or vs code.

Create project : dbt init “project-name”

At the root of the project directory, we have dbt_project.yml. It’s the main config file of the project. We need to match the profile “synapse_profile” name with the root key of profile.yml. At profile.yml we set a target and declare the target to connect Synapse SQL database.
Microsoft Azure Synapse DW Profile | dbt Docs (getdbt.com)

3. The type is “synapse” named after the adapter we installed previously. Collect server name from Synapse connection string. Then replace that at the server key in profile.yml. For this, We need to go to the Manage section in the Synapse workspace. Click on the newly created “dedicated_pool”. A property window will open with a connection string for the different adapters. Select “SQL authentication” from the ODBC section. The connection string below contains all the info to populate the “azure_dedicated” profile.

4. Check the connection with cli. run the command in vs code terminal.

dbt debug

5. There are two default models already in the created project. You can push them SQL Database with the deploy command.

dbt run

Concluding thoughts

DBT has loads of built-in functions to transform data. Look into the docs for various options. Generating analytics via synapse is fun with DBT. Here is a demo project to seed some data from dbt-labs. This will help you a jump-start.

dbt-labs/jaffle_shop: A self-contained dbt project for testing purposes (github.com).

We can import data from here to Synapse dedicated SQL server. Now they will be ready to transform and generate analytics.

Thank you for following along!

I think 14 Peaks just got released. Now it’s time for some Netflix and Coffee.

--

--

Sayedur Rahaman

Full Stack Developer by day, Amateur astronomer at night.