Loading from Azure Data Lake Store Gen 2 into Azure Synapse Analytics (Azure SQL DW) via Azure Databricks

Figure 1: Interaction beween Azure Databricks, SQL DW and Azure Data Lake G2 for Data Transfer.
  • Spark Driver to SQL DW
  • Spark Driver and Executors to Azure Data Lake G2
  • SQL Data Warehouse to Azure Data Lake G2

Spark Driver to SQL DW

jdbc:sqlserver://<server-name>.database.windows.net:1433;database=<dbname>;user=<user-name>;password=<password>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;

Spark Driver and Executors to Azure Data Lake G2

spark.conf.set(“fs.azure.account.auth.type.”+<storage-account-name>+”.dfs.core.windows.net”, “OAuth”)
spark.conf.set(“fs.azure.account.oauth.provider.type.”+<storage-account-name>+”.dfs.core.windows.net”, “org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”)
spark.conf.set(“fs.azure.account.oauth2.client.id.”+<storage-account-name>+”.dfs.core.windows.net”, <application-id>)
spark.conf.set(“fs.azure.account.oauth2.client.secret.”+<storage-account-name>+”.dfs.core.windows.net”, <application-secret>)
spark.conf.set(“fs.azure.account.oauth2.client.endpoint.”+<storage-account-name>+”.dfs.core.windows.net”, “https://login.microsoftonline.com/" + <tenant_id> + “/oauth2/token”)
spark.conf.set(“fs.azure.createRemoteFileSystemDuringInitialization”, “true”)
tempDir = "abfss://" + <file-system> + "@"+ <storage-account-name> + ".dfs.core.windows.net/tempDirs"
  • the values supplied would be parameterized i.e the code above would be in a function.
  • credentials supplied would be stored in Databricks Secrets (typically in a Secret Scope backed by Azure Key Vault).
  • credentials would be picked up from Databricks Secrets via a key at the time of use.

SQL Data Warehouse to Azure Data Lake G2

Steps for Databricks to write to Azure SQL DW using MSI over Polybase

  1. Create the MSI for SQL Server DW using Azure CLI
  • Select Access Control(IAM) in the left pane of your Data Lake Gen 2 storage account.
  • Select Add a Role Assignment
  • Select Storage Blob Data Contributor from the Role Dropdown list. Search for the server name as above and select the MSI. This will allow read, write, delete access to filesystem and data.
CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = ‘Managed Service Identity’;
tempDir = abfss://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>targetDf.write \.format(“com.databricks.spark.sqldw”) \.option(“url”, <sql-dw-Jdbc-Url>) \.option(‘useAzureMSI’,’true’)\.option(‘maxStrLength’,’4000')\.option(‘preAction’,<pre-action-SQL>)\.option(“dbTable”, <target-table-name) \.option(“tempDir”, tempDir) \.mode(“overwrite”).save()

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store