Translate TO_DSINTERVAL function (Teradata) in BigQuery

Abhishek Pratap Singh
Google Cloud - Community
4 min readNov 15, 2023

Introduction

Teradata contains the function TO_DSINTERVAL, which converts a given string into the Teradata specific format INTERVAL DAY(4) TO SECOND(6).

INTERVAL DAY(4) TO SECOND (6) represents a time interval which can contain up to 4 digits for representing days, and 6 digits of fractional seconds. Hence, it can be used to represent values up to “​​9999 23:59:59.999999” (which is read as 9999 days, 23 hours, 59 minutes, and 59.999999 seconds).

In this article, we will cover how to translate this function in BigQuery and the various considerations that we need to keep in mind while doing so.

Google BigQuery logo

Inputs to the TO_DSINTERVAL function in Teradata

The TO_DSINTERVAL takes a string input, which can have either format from the following:

  • SQL interval format.
  • ISO duration format.

SQL interval format

The SQL interval format is [+|-]DDDD HH:MM:SS.SSSSSS. Following are the ranges that these variables can take:

  • DDDD is an integer between (0, 9999).
  • HH is an integer between (0, 23).
  • MM is an integer between (0, 59).
  • SS is an integer between (0, 59), while the optional fractional seconds can range from .0 to .999999.

As this format is similar to the output INTERVAL DAYS(4) TO SECOND(6) format, conversion from the SQL interval format is relatively easier.

ISO duration format

The format for this is [-]P[n]DT[n]H[n]M[n][.fractional_seconds]S, where ’n’ represents the value of the corresponding element. Here, we need to keep the following in consideration:

  • Elements can be omitted if they are zero. However, all the elements can’t be skipped simultaneously — at least one element and its number is required.
  • The interval can be negative (denoted by a leading negative sign) or positive (when no negative sign is present).
  • The T designator always precedes the time elements, and it can only be omitted if neither of the hour, minute, and second elements are present.
  • Values can exceed their normal time limit. For example, ‘PT36H’ is a valid input that represents 36 hours (even though there are only 24 hours in a day). Similarly, ‘PT3H79M’ is also a valid input representing 3 hours and 79 minutes.

These considerations make translating from the ISO duration format relatively trickier.

Conversion Logic

The following custom UDF written in GoogleSQL translates the TO_DSINTERVAL function, and can be deployed in BigQuery:

CREATE or REPLACE FUNCTION dataset_name.to_dsinterval(input string)

returns string as ((

SELECT CASE

— when input is in ISO duration format

WHEN regexp_contains(input, r’(-)?P(?:([\d]+)D)?(?:T(?:([\d]+)H)?(?:([\d]+)M)?(?:([.,\d]+)S)?)?’)

THEN ((

SELECT CONCAT(neg_sign,days,’ ‘,FORMAT_TIME(‘%H:%M:%E*S’,PARSE_TIME(‘%H:%M:%E*S’, CONCAT(hours,’:’,minutes,’:’,seconds)))) from (

select neg_sign, days + add_to_days as days, hours, minutes, seconds from (

select neg_sign, days, floor((hours + add_to_hours)/24) as add_to_days, mod((hours + add_to_hours),24) as hours, minutes, seconds from(

select neg_sign, days, hours, floor((minutes + add_to_min)/60) as add_to_hours, mod((minutes + add_to_min),60) as minutes, seconds from (

select neg_sign, days, hours, minutes, floor(seconds/60) as add_to_min, mod(seconds,60) as seconds from (

select

ifnull(regexp_extract(input,r’(-)?’),’’) as neg_sign,

cast(ifnull(regexp_extract(input,r’-?P(?:([\d]+)D)’),’0') as int64) as days,

cast(ifnull(regexp_extract(input,r’-?P(?:[\d]+D)?(?:T(?:([\d]+)H))’),’0') as int64) as hours,

cast(ifnull(regexp_extract(input,r’-?P(?:[\d]+D)?(?:T(?:[\d]+H)?(?:([\d]+)M))’),’0') as int64) as minutes,

cast(ifnull(regexp_extract(input,r’-?P(?:[\d]+D)?(?:T(?:[\d]+H)?(?:[\d]+M)?(?:([.\d]+)S)?)?’),’0') as numeric) as seconds

)))))

))

— when input is in SQL duration format

WHEN regexp_contains(input, r’-?(?:[\d]+) (?:[\d]+):(?:[\d]+):(?:([\d]+))’)

THEN ((

SELECT

CASE WHEN hours >= 24 or minutes >= 60 or seconds >=60 THEN ERROR(“ERROR: Invalid Interval”)

ELSE CONCAT(neg_sign,days,’ ‘,FORMAT_TIME(‘%H:%M:%E*S’,PARSE_TIME(‘%H:%M:%E*S’, CONCAT(hours,’:’,minutes,’:’,seconds))))

END

from(

select

ifnull(regexp_extract(input,r’(-)?’), ‘’) as neg_sign,

cast(regexp_extract(input,r’-?(?:([\d]+))’) as int64) as days,

cast(regexp_extract(input,r’-?(?:[\d]+) (?:([\d]+))’) as int64) as hours,

cast(regexp_extract(input,r’-?(?:[\d]+) (?:[\d]+):(?:([\d]+))’) as int64) as minutes,

cast(regexp_extract(input,r’-?(?:[\d]+) (?:[\d]+):(?:[\d]+):(?:([.\d]+))’) as numeric) as seconds

)

))

WHEN input = ‘’ THEN ERROR(“ERROR: Invalid Interval”)

ELSE NULL

END

));

The UDF has two parts, one for translating ISO duration input and the other for translating SQL duration input. Regex expressions are used to recognize the format of the input string and implement the corresponding logic accordingly.

For the ISO duration format:

  • The innermost SELECT block extracts the element values from the input string.
  • The middle SELECT blocks implement the overflow logic.
  • The outermost SELECT block formats the output in the required format.

Similarly, for the SQL duration format:

  • The inner SELECT block extracts the element values from the input string.
  • The outer SELECT block formats the output in the required format.
  • We don’t need to implement overflow logic in this case as the input elements are defined to be within their normal time limits.

One observation to keep in mind is that this UDF returns the final output as a string which is formatted like the INTERVAL DAYS(4) TO SECOND(6) data type. In order to perform any further operations on the output generated from this UDF, one can extract the elements and cast them into the required BQ supported INTERVAL types accordingly.

--

--