SQL Variables in Databricks

Ryan Chynoweth
2 min readOct 6, 2023

--

Last December we published a blog providing an overview of Converting Stored Procedures to Databricks. In that blog we showed that users could set and use variables with the following example.

Setting Variables

CREATE WIDGET TEXT CostCenter DEFAULT '';

SET var.year_variable = YEAR(CURRENT_DATE());

SET var.total_qty = (
SELECT sum(qty)
FROM my_schema.my_staging_source_table
WHERE YEAR(ModifiedDate) = ${var.year_variable} and CostCenter = $CostCenter;
);

CREATE TABLE my_schema.cost_center_qty_agg
AS
WITH cte as (
SELECT year, sum(qty) as summed_qty
FROM my_schema.my_staging_source_table
WHERE CostCenter = $CostCenter
GROUP BY YEAR
)

SELECT *
FROM cte
WHERE summed_qty > ${var.total_qty} ;

In the above code, I am making use of the year_variable variable. This was a slight workaround previously and followed an odd syntax with using ${var.variable_name}. This workaround did not work in the Databricks SQL interface and only in the notebooks. So now that we have the official implementation, users can leverage variables regardless of execution environment. Please note that 14.1 is currently the latest runtime and may not be deployed in Databricks SQL current channel at this time.

Fortunately, with the introduction of Databricks Runtime 14.1, variables are now fully supported! To gain a deeper understanding of this feature, please refer to our documentation. Here is a refactored version of the above query using the new syntax.

CREATE WIDGET TEXT CostCenter DEFAULT ‘’; 

DECLARE VARIABLE year_variable TIMESTAMP;
DECLARE VARIABLE total_qty DOUBLE;

SET var year_variable = YEAR(CURRENT_DATE());

SET var total_qty = (
SELECT sum(qty)
FROM my_schema.my_staging_source_table
WHERE YEAR(ModifiedDate) = year_variable and CostCenter = $CostCenter;
);

CREATE TABLE my_schema.cost_center_qty_agg
AS
WITH cte as (
SELECT year, sum(qty) as summed_qty
FROM my_schema.my_staging_source_table
WHERE CostCenter = $CostCenter
GROUP BY YEAR
)
SELECT *
FROM cte
WHERE summed_qty > total_qty ;

Please note that syntax for using the variable. For example, in one of the where clauses you see YEAR(ModifiedDate) = year_variable in which we use the year_variable variable. I am very excited to see this feature released! Happy coding everyone!

Disclaimer: these are my own thoughts and opinions and not a reflection of my employer

--

--

Ryan Chynoweth

Senior Solutions Architect Databricks — anything shared is my own thoughts and opinions