Azure Synapse Analytics Dedicated SQL Pool Tables and Views Migration using Data Factory

Synapse Analytics (formerly known as SQL DW) has brought change to the industry on how we manage and implement data warehousing. With the performance and storage it provides, Azure Synapse Analytics has been known to be one of the best tools under Microsoft offering for business and scale when needed.

Proper resource management is one of the keys to sustain a well-built architecture, and changes might occur among resources based on cost and business factor.

As more and more projects are created in the Lake, we found multiple challenges to properly manage the resources, let alone solving problems with multiple start/stop operations via API to reduce the cost . Hence today I am covering a solution which we have implemented in the past on migrating and combining multiple Synapse instance into just one centralized version to keep resource management at peak condition. We believe that by combining multiple projects into one business functional Synapse dedicated SQL pool can reduce the footprint and further reduce complexity in administration and management.

We have been using Data Factory as an orchestration and Copy Data tool, hence by its capability we believe we can create a pipeline that loops through every single table and views in source Synapse instance and copy each of them into the newly created instance.

Azure Data Factory pipeline to lookup and copy every table and view from source to target Synapse instance

Copy Data for all tables:

  1. Create a Lookup activity to get all table names (excluding external tables) from source Synapse
SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES a LEFT JOIN sys.external_tables b ON a.TABLE_NAME = b.name WHERE b.name IS NULL AND a.TABLE_TYPE = 'BASE TABLE'

2. Create a ForEach activity to loop through all the items returned from the Lookup activity in Step #1.

@activity('Get Table Names from Synapse').output.value

3. Create a Copy Data activity in the activity of ForEach by creating a custom SQL query based on the each table name in the loop in order to copy from source to target Synapse

SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]

4. At the sink of Copy Data, select Auto create table option and enter the pre-copy script to truncate the table if exists at target Synapse instance.

IF EXISTS (SELECT * FROM information_schema.tables WHERE table_schema = '@{item().TABLE_SCHEMA}' AND table_name = '@{item().TABLE_NAME}') TRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]

Re-create all views:

  1. Create a Lookup activity to get all view names from source Synapse.
SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE = 'VIEW'

2. Create a ForEach activity to loop through all the items returned from the Lookup activity in Step #1.

@activity('Get View Names from Synapse').output.value

3. Create a Lookup activity within ForEach to retrieve the view definition of each view in the loop.

SELECT VIEW_DEFINITION FROM INFORMATION_SCHEMA.VIEWS WHERE TABLE_SCHEMA = '@{item().TABLE_SCHEMA}' AND TABLE_NAME = '@{item().TABLE_NAME}'

4. Create a Lookup activity prior to view creation to drop the view in case it exists.

DROP VIEW IF EXISTS [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]; SELECT 1;

5. After getting the create view statement from Step #3, create another Lookup activity to execute the query at target Synapse instance to recreate them.

@activity('Get View definition').output.firstRow.VIEW_DEFINITION

6. Add a Wait activity of 1 second after the Lookup activity to handle the failure of executing the query of creating the view, without using SELECT to return results back to ADF.

You should be able to copy and recreate all tables and views easily through this process via Data Factory. The obvious key takeaway is that Data Factory has no native support to perform DMLs on SQL databases or Synapse Analytics, hence the workaround here is to use Lookup activity and return a result by adding SELECT 1 at the end of the desired query (which is not possible while creating view).

Besides from the slight drawback, it is working as we intended and have met our expectation.

--

--

--

Platform Administrator/Solution Architect, Experienced

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Adding A Comment System to Your GitHub Page Using AWS Lambda — Part 2 of 2

The architecture of a AWS Lambda-powered comment system.

Connecting to redis cluster using Java/Python clients

Clever is Stupid

Different types of API integration for Vacation Rental system

Tracking team mood with Mood-o-Meter app

Framework for Data Driven DevOps

FrontEnd Roadmap

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mohd Fazilee Lee

Mohd Fazilee Lee

Platform Administrator/Solution Architect, Experienced

More from Medium

How we avoid SSL Certificate expiry

Connecting Alteryx to Databricks SQL

Alteryx data connections window

Storage Solutions on Microsoft Azure

Using Delta Lake on Data Bricks to transform Event Hub events in real time