Convert NORMALIZE (Teradata) to Bigquery

Anas Aslam
Google Cloud - Community
3 min readNov 11, 2022

Introduction

NORMALIZE is a Teradata extension to the ANSI SQL standard. When a SQL table includes a column that has a PERIOD data type, NORMALIZE combines values that meet or overlap in that column, to form a single period that consolidates multiple, individual period values.

To use NORMALIZE, at least one column in the SQL SELECT list must be of Teradata’s temporal PERIOD data type.

Bigquery currently doesn’t support the PERIOD data type in a table. Instead, we need to divide a Teradata PERIOD data field into two parts: eg) start_date and end_date

Prerequisites

  1. Basic knowledge of the Teradata data warehouse and its SQL syntax
  2. Good understanding of Bigquery and its SQL syntax
Teradata SQL to Bigquery SQL

Conversion Logic

Example Scenario:

Assume in the below example, we are going to Normalize on the employee_id.Here period column is split into two parts (start_date and end_date)

The sample data looks as below

Employee Details Table

Teradata SQL:

SELECT NORMALIZE ON MEETS OR OVERLAPS emp_id, duration
FROM poc_aa_cs.Normalize_Emp_Details
ORDER BY 1,2;

Output:

Normalize Output

Bigquery SQL:

SELECT S2.emp_id, MIN(S2.start_date) as new_start_date, MAX(S2.end_date) as new_end_date
FROM
( SELECT S1.*, SUM(GroupStartFlag) OVER (PARTITION BY emp_id ORDER BY start_date ROWS UNBOUNDED PRECEDING) As GroupID
FROM ( SELECT emp_id, start_date, end_date,(CASE WHEN start_date <= LAG(max_end_date) OVER (PARTITION BY emp_id ORDER BY start_date, end_date) THEN 0 ELSE 1 END) AS GroupStartFlag
FROM (select emp_id, start_date, end_date, max(end_date) OVER (PARTITION BY emp_id ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) as max_end_date from `poc-env-aks.poc_aa_cs.Normalize_Emp_Details` )) S1
) S2
GROUP BY S2.emp_id, S2.GroupID
ORDER BY 1,2;

Translation Explanation:

  • This LAG function returns the values for a row at a given offset above (before) the current row in the partition.
  • You can use the LAG function to identify each row that begins a new period by determining if a period meets or overlaps with the previous period (0 if yes and 1 if no )
  • When this flag is cumulatively summed up, it provides a group identifier that can be used in the outer Group By clause to arrive at the desired result.
  • The output of S1 is as follows
Output of S1
  • The output of S2 is as follows
Output of S2

After these changes, Bigquery SQL generates the same output as Teradata.

--

--