Convert Group By CUBE (Teradata) to Bigquery
Introduction:
Group by CUBE is a teradata extension to the group by clause that allows performing multiple group by clauses in the single statement.
Given n dimensions, CUBE generates 2n different kinds of groups. Each group is represented as a single row.
Prerequisites
- Basic knowledge of the Teradata data warehouse and its SQL syntax
- Good understanding of Bigquery and its SQL syntax
Conversion Logic
Example Scenario:
Lets says, the requirement is to find out the effect of city and state on the sales_amount
The sample data looks as below
Teradata SQL:
SELECT city, state, SUM(sales_amount) as total_sales
FROM sales_tbl
GROUP BY CUBE (city, state);
Bigquery SQL:
SELECT
sales_tbl.city AS city,
NULL AS state,
sum(sales_tbl.sales_amount) total_sales
FROM
sales_tbl
GROUP BY 1UNION ALL
SELECT
NULL AS city,
sales_tbl.state AS state,
sum(sales_tbl.sales_amount) total_sales
FROM
sales_tbl
GROUP BY 2UNION ALL
SELECT
sales_tbl.city AS city,
sales_tbl.state AS state,
sum(sales_tbl.sales_amount) total_sales
FROM
sales_tbl
GROUP BY 1, 2UNION ALL
SELECT
NULL AS city,
NULL AS state,
sum(sales_tbl.sales_amount) total_sales,
FROM
sales_tbl
;
Translation Explanation:
Since there are two dimensions in the group by cube, (2*2) = 4 groups are created.
Group 1: Effect of City
Group 2: Effect of State
Group 3: Effect of City and State
Group 4: Effect of No City and State
Hence the final query output is a union all of all the above 4 groups
After these changes, Bigquery SQL generates the same output as Teradata.