An introduction to Dynamic SQL in BigQuery — Part 1

Tired of writing the same bits of code all over again? This step-by-step guide on how to make flexible Stored Procedures in BigQuery is for you!

Michaël Ménaché
6 min readJan 26, 2022

Stored Procedures in BigQuery (and in SQL in general) are a useful tool to wrap and re-use code… but they’re very limited: you can’t parametrize table or column.

If, like me, you’re used to Python functions or the SAS Macro Language, this can be painfully frustrating.

Fortunately, there is a way to bypass these limitations: Dynamic SQL.
…sooo what is this Dynamic SQL thing? Think of it like this: it’s a way to dynamically generate a string containing your query, then run your query.
This opens powerful new doors, as you can use anything to generate your query, including table or colum names.
Want to write a procedure to transpose a table (any table) and share it with your team? Dynamic SQL is your friend!

Stored Procedures… and their limits

Before digging into Dynamic SQL, let’s have a quick recap on Stored Procedures
Stored Procedures are a way to save and parametrize a query, then run it again without having to write it again. A bit like a function in Python or a Macro in SAS.

Let’s take a simple example:
Say you have sales data fom a retailer and you want to calculate the sum of sales by customer segment after a certain date, your query would look like this:

select
CUST_TYPE
, sum(SPEND) as SALES
from
`my-project.sandbox.TRX`
where
DATE > DATE'2018-05-16'
group by
CUST_TYPE
;

Now if you want to run this query again but with a different date, you can copy paste the query and change the date, but your code would look messy.
And if your query is long (unlike this simple example), it may not be easy to find exactly where are the elements you need to change.

Another way is to create a Stored Procedure, with the date as a parameter. like this:

-- a. Create the Stored Procedure
CREATE OR REPLACE PROCEDURE `my-project.sandbox.SP_SUM1`(_end_date DATE)
BEGIN
select
CUST_TYPE
, sum(SPEND) as SALES
from
`my-project.sandbox.TRX`
where
DATE > _end_date
group by
CUST_TYPE
;
END;

The way this works is quite simple:

  • CREATE OR REPLACE PROCEDURE `my-project.sandbox.SP_SUM1`(_end_date DATE) creates a Stored Procedure called SP_SUM1 in my-project.sandbox
    The arguments for the Stored Procedure are defined between the brackets: here we have only one argument (_end_date) and its type is DATE.
  • The query follows, between BEGIN and END

You can then run the procedure with different dates, making your code easy to read, like this:

-- b. Call the Stored Procedure
CALL `my-project.sandbox.SP_SUM1`(DATE’2018–05–16');
CALL `my-project.sandbox.SP_SUM1`(DATE’2018–05–01');
CALL `my-project.sandbox.SP_SUM1`(DATE’2018–05–10');

So far so good. We parametrized the date and it works like a charm.

But the problem comes if you want to parametrize column names (e.g. to change the column to sum or how to group the data): it doesn’t work.

Say we want to have the column we aggregate and the column we group by as parameters: based on the example above, we would try:

-- 1.3 Stored Procedure with column names as parameters
CREATE OR REPLACE PROCEDURE `my-project.sandbox.SP_SUM2`(
_sumVar STRING — column to aggregate (SPEND)
, _groupVar STRING — column to group by (CUST_TYPE)
)

BEGIN
select
_groupVar
, sum(_sumVar) as _SUM
from
`gcp-wow-supers-rtla-qtm-dev.sandbox.MM_TRX`
group by
_groupVar
;
END;

Looks good… but this will throw an error:

This is the limit of Stored Procedures: you can’t parametrize table and columns names.

Quite a bummer… but this is where Dynamic SQL saves the day!

Dynamic SQL 101 — A simple example

Let’s write the same procedure as above, but this time we’ll use Dynamic SQL:

-- 1.4 Stored Procedure with column as arguments using DYNAMIC SQL
-- a. Create the Stored Procedure
CREATE OR REPLACE PROCEDURE `my-project.sandbox.SP_SUM2`(
_sumVar STRING -- column to aggregate. Declare as STRING
, _groupVar STRING -- column to group by. Declare as STRING
)
BEGIN

-- Write the query as a String (_queryString)
DECLARE _queryString STRING;
SET _queryString = """ -- open string with 3 quotation marks
select
"""||_groupVar||""" -- insert argument
, sum("""||_sumVar||""") as _SUM -- insert argument
from
`my-project.sandbox.TRX`
group by
"""||_groupVar||""" -- insert argument
;
"""; -- close string with 3 quotation marks

select _queryString; -- displays query generated by Dynamic SQL
EXECUTE IMMEDIATE (_queryString); -- execute the query
END;

The structure is close to a normal Stored Procedure, but there are some key differences:

  • Parameters are all STRING (_sumVar STRING, _groupVar STRING)
  • We need to declare and set the string that will contain the query (DECLARE _queryString STRING; SET _queryString =)
  • Open and close the string with 3 quotations marks (""")
  • Use the concatenate operator (||) to insert the arguments in the string
  • Close the string before and open the string again after each argument ("""||_sumVar||""")
  • Use EXECUTE IMMEDIATE to, well, execute the query.

Once done, it is called like any Stored Procedure:

-- b. call the stored procedure
-- Sum of SPENDby CUST_TYPE
CALL `my-project.sandbox.SP_SUM2`(
"SPEND"
, "CUST_TYPE"
);
-- Sum of SPEND by DEPARTMENT
CALL `my-project.sandbox.SP_SUM2`(
"SPEND"
, "DEPARTMENT"
);

The Stored Procedure is now very flexible, and we can even group by 2 or more columns:

-- Sum of TOT_AMT_INCLD_GST by BMP_CLUSTER_NAME, DEPT
CALL `my-project.sandbox.SP_SUM2`(
"SPEND"
, "CUST_TYPE, DEPARTMENT"
);
Results from the Strored Procedure

Note: Having select _queryString; in the Stored Procedure beforeEXECUTE IMMEDIATE will display the query generated by Dynamic SQL in the results window, so you can check your query and correct it if needed:

You can display the query generated by Dynamic SQL

Dynamic SQL 102 — A (marginally) more complex example

All Dynamic SQL does, really, is generate some text and execute it as a query. As a result, we can use anything as arguments: table, column or dataset names, strings, dates, numbers… even functions or statements (e.g. a where clause).
This is what makes Dynamic SQL so flexible.

In our next example, we’ll illustrate this by adding dataset names, table names and aggregation functions as arguments:

-- 2.2 Write a Stored Procedure with dataset name, table name and aggregation function as argumentsCREATE OR REPLACE PROCEDURE `my-project.sandbox.SP_SUM3`(
_dataSet STRING — dataset where input table is located
, _trx_table STRING — input table
, _sumVar STRING — column to aggregate
, _groupVar STRING — column to group by
, _aggfunc STRING — aggregation function (SUM, AVERAGE, MIN…)
, _destination STRING — destination table create
)
BEGIN

DECLARE _queryString STRING;
SET _queryString = """
CREATE OR REPLACE TABLE `"""||_dataSet||"""."""||_destination||"""` as
select
"""||_groupVar||"""
, """||_aggfunc||"""("""||_sumVar||""") as SALES
from
`"""||_dataSet||"""."""||_trx_table||"""`
group by
"""||_groupVar||"""
;
""";
select _queryString;
EXECUTE IMMEDIATE (_queryString);
END;

The structure is the same as the first example, we’re just adding more parameters to make the Stored Procedure more flexible.

This allows us to use the same Stored Procedure on different tables and aggregate different columns:

-- 2.3 Call the stored proc 
-- Sum of spend by customer type and department from the TRX table
CALL `my-project.sandbox.SP_SUM3`(
“my-project.sandbox” -- data set
, “TRX” -- input table
, “SPEND” -- column to aggregate
, “CUST_TYPE, DEPARTMENT” -- column(s) to group by
, “SUM” -- aggregation function
, “SP_SUM3_DEMO1” -- destination table to create
);
-- average quantity by customer type and store type from another table (TRX2)
CALL `my-project.sandbox.SP_SUM3`(
“my-project.sandbox” -- data set
, “TRX2” -- input table
, “QTY” -- column to aggregate
, “CUST_TYPE, STORE_TYPE” -- column to group by
, “AVG” -- aggregation function
, “SP_SUM3_DEMO2” -- destination table to create
);

And that’s it: we’ve successsfully buit a flexible procedure we can use on any table!

Hopefully with this short guide you can enjoy some of the benefits of building flexible Stored Procedures in BigQuery:

  • Makes codes easier to write and read
  • Easy to share: Stored procedure can be used by anyone with acces to the dataset where they are stored

--

--

Michaël Ménaché

Lead Analyst at Quantium. Originally from Paris, now in Sydney, with 15 years of experience in retail and a passion for data. Oh, and I’m legally blind!