Dynamic SQL in Databricks and SQL Server

Dian Germishuizen
Learning SQL
Published in
4 min readMay 22, 2022
SQL Logo — Image via Microsoft’s Analytics In A Day Workshop Presentation — Analytics In A Day (microsoft.com)

What is dynamic SQL?

Dynamic SQL is a programming technique where you write a general purpose query and store it in a string variable, then alter key words in the string at runtime to alter the type of actions it will perform, the data it will return or the objects it will perform these actions on before it is actually executed.

Some form of dynamic SQL execution exists on all technologies that support the SQL language.

In this article, I will focus on Microsoft SQL Server (On-Premises or Azure) and Databricks.

Use Cases

  1. In Azure Synapse Analytics (Serverless and Dedicated Pool) you can dynamically generate an external table pointing to a location in the data lake which is determined at runtime. This is ideal when you want the external table to only point to the latest file available in a lake, not all historical files.
  2. Maintenance — you can perform the same maintenance task on a large set of objects when you make the maintenance code dynamic in terms of the object it interacts on. These actions can be to maintain indexes, drop tables not needed, clear the cache of databases etc.
  3. Pivot and Unpivot — you can create a stored procedure to pivot or unpivot a table structure based on the column names sent in as parameters. This means you never have to google how to do the actual pivot or unpivot again — joy!
  4. JSON Normalization — you can dynamically normalize a JSON object or array by dynamically getting all the attributes from the JSON string and generating the code to persist the normalized structure.
  5. Optimized data retrieval — this is very useful for databases that get requests from multiple users at a time. You can create a dynamic query that will filter a column based on the value sent in from a front end application at runtime. This means a reduced set of data is returned, increasing performance and security.

These are but a few, I am not going to list all the use cases, I do have a life you know…

Benefits

The main benefit is you can increase the level of automation of your application by having code be generated dynamically, saving your developers the time of writing it all out manually. Especially if the majority of the structure of the query is the same with only minor differences here and there.

Considerations

  1. Dynamic SQL can be hard to debug. I typically print out the SQL statement generated first before ever adding the EXEC statement at all so I can investigate the code generated and run it manually before letting SQL run it for me.
  2. It is vulnerable to SQL Injection attacks, especially if you rely on input from users and not input from secure backend sources the users don’t interact with.
  3. It can be slower to execute due to the execution plans the SQL engine uses to perform the actions the code instructs having to be generated anew each time — depending on the degree to which the query statement is altered as part of your dynamic-ness.

SQL Server Example

In this example, the query will filter the table based on a dynamic value passed in at runtime.

When we execute this piece of code as part of a stored procedure and pass in the value 123, the final resulting code will look like the below.

SELECT *
FROM [dbo].[Customers]
WHERE [Customers].[CustomerID] = 123

Databricks Example

In Databricks, there are situations where using SQL is the better choice over Python or Scala to perform certain actions, e.g. when working with Delta tables and merging new data into it.

Some pre-requisites for the below example:

  1. You need to have a dataframe already defined earlier in the notebook and have it saved as a temporary view called sourceTemporaryView
  2. The dataframe must have a field called ProductCategory to apply the partitioning on. This line can actually also be made dynamic to pass in a dynamic field name to partition on.
  3. You must have a linked data lake where the data for this delta table will be stored. This location is defined by the vDeltaTablePath variable value. This should ideally be the path to a mounted data lake location.

Here is an example of python code where you can dynamically generate a new Delta table using SQL based on the data in a pre-defined dataframe.

The final sql query generated and executed can look something like this

CREATE TABLE IF NOT EXISTS silver.Product 
USING DELTA
LOCATION '/mnt/silver/WorldWideImporters/Batch/Product/'
PARTITIONED BY ( ProductCategory )
AS
SELECT * FROM sourceTemporaryView

Closing Arguments

This article only gave a very high-level overview of the potential that dynamic SQL holds.

But remember, with great power, comes great responsibility.

Only use dynamic SQL when necessary as it comes with a cost — see the considerations above.

Ok, good luck, have fun, and don’t accidentally drop the production database using this information, I take no responsibility.

Tags

#analytics, #automation, #data, #databricks, #synapse-analytics, #synapse-analytics-serverless, #t-sql

Thank you for reading my ramblings, if you want to you can buy me a coffee here: Support Dian Germishuizen on Ko-fi! ❤️

My Socials

My Blog: diangermishuizen.com

Linked In: Dian Germishuizen | LinkedIn

Twitter: Dian Germishuizen (@D_Germishuizen) / Twitter

Credly: Dian Germishuizen — Badges — Credly

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--

Dian Germishuizen
Learning SQL

I have been working in the Technology Industry as a Data Engineer since 2016. I have a passion for learning new things and sharing that knowledge with others.