How to Fix an Error in the DBT Command with the Empty Flag Due to _TABLE_SUFFIX or _PARTITIONTIME
Utilize dbt variables to fix the error
Since dbt core 1.8, the empty
flag that allows us to create empty resources has been added.
When this flag is specified, a condition limit 0
is added to your query, which means that the output will be empty as well as there will be no cost of the scanning.
This function can be used in a variety of ways, for example,
- You have written some new queries and want to check that the all of SELECT statements are valid within 30 seconds.
- You want to write a yaml with dbt-osmosis, so you want to create empty tables.
I often hesitate to repeatedly execute queries that require a lot of analysis cost in DWH for debugging, so I am grateful for such cost-saving features.
However, you need to be careful when using this feature in the case of queries that select pseudo-columns such as _TABLE_SUFFIX and _PARTITIONTIME in BigQuery.
For example, suppose we have the following model.
SELECT
*
FROM
{{ source("ga4", "events_*") }}
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", CURRENT_DATE("+9") -3 )
AND FORMAT_DATE("%Y%m%d", CURRENT_DATE("+9"))
Running dbt run with the empty flag on this model results in the following error.
Unrecognized name: _TABLE_SUFFIX at [6:3]
To investigate the error, Let’s look at the compiled query.
SELECT
*
FROM
(select * from `bigquery-public-data`.`ga4_obfuscated_sample_ecommerce`.`events_*` where false limit 0)
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", CURRENT_DATE("+9") -3 )
AND FORMAT_DATE("%Y%m%d", CURRENT_DATE("+9"))
You can see that when adding the empty
flag, the data source referenced by source or ref is wrapped as a subquery with limit 0
.
Since the pseudo column is no longer retained after the subquery select * is executed, it is no longer possible to select _TABLE_SUFFIX in subsequent queries. That is the cause of the error.
This behavior was reported immediately after the 1.8 release. In fact, this behavior is not a bug, and therefore, a workaround should be considered.
A simple way to handle this is to use flags
, a variable that can be used in Jinja to determine whether or not cli options are specified.
We need to look at the dbt-core code to understand the details, but we can see that True/False is stored as the return value of flags
and thus can be used in the same way as is_incremental()
.
If the scan amount becomes zero by the empty
flag, there is no need to filter by _TABLE_SUFFIX, so all we need to do is to use this variable to not refer to _TABLE_SUFFIX if the empty
flag is specified.
SELECT
*
FROM
{{ source("ga4", "events_*") }}
{% if not flags.EMPTY %}
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", CURRENT_DATE("+9") -3 )
AND FORMAT_DATE("%Y%m%d", CURRENT_DATE("+9"))
{% endif %}
Nevertheless, there is a slight problem with this method: the query that has require_partition_filter = true and uses _PARTITIONTIME as the partition column still cannot be executed.
In that case, There are still some other solutions you can take:
- remove require_partition_filter
- specify a real data column as the partition column instead of using a pseudo-column partition
- rewrite the behavior of
empty
flag in combination with other Jinja functions as described in the comments in the issue.
However, a better method may be available by the time dbt core 1.9 is released since dbt-core is updated quickly.
In addition, while these dbt-specific variables and functions are useful and utilizing them makes us feel smarter, you are careful of using them excessively because the learning cost of new members rapidly increases.