Pivoting and Unpivoting on Multiple Columns in Google Big Query

How to regroup related data

Charlotte Patola
CodeX
6 min readSep 17, 2023

--

Working with data, one often comes across datasets that are structured in a — for a specific use case— sub-optimal way. A few weeks back, I stumbled upon a structure that was new to me. It was a contact list, containing names, phone numbers and email addresses. Instead of having one row per contact, the big query table had one row per person the contacts were related to. The contacts were stored in columns, with a number stating which contacts info the column contained. The structure is pictured below.

Raw data in Big Query

For my use case, it would need one column for guardianName, one for guardianEmail and one additional for guardianType, which holds information about weather the person is the main (1) or secondary (2) guardian.

In other words, I would like to make a new grouping based on multiple column pairs:

Desired structure

How to solve this challenge? My suggestion would be to make use of the SQL PIVOT and UNPIVOT functions.

Before getting into solution details, let´s have a look at the general syntax and logic of PIVOT and UNPIVOT.

How does PIVOT work?

The verb pivot translates into “to turn” or “to switch”. When pivoting a table, it is “turned around” so that rows become columns. The table gets more columns, hence, becomes wider¹.

In the PIVOT example below, the single column Manufacturer is divided up into two new columns: one for each value in the column.

These new columns could, for example, be used in a machine learning model, where categorical values are often decoded as dummy variables.

How does UNPIVOT work?

Unpivot is the exact opposite of pivot; columns are rotated into rows and the table gets fewer columns, hence, becomes narrower¹.

Using the previous section’s PIVOT output as a starting point, the two manufacturer dummy columns are combined into one single column named manufacturer.

In this example, on top of using UNPIVOT, we also need to make some further modifications to get back to the initial state we had before pivoting. We also need to exclude the new Values column from our SELECT statement and add a WHERE clause to exclude rows with the values 0. If we don´t do this, we get one row for each manufacturer, despite it being a dummy variable, where only rows with the value 1 are valid. See picture below.

Structure we do not want

Getting Back to the Use Case: Unpivoting on Multiple Columns

The main challenge with the contact list use case presented in the beginning of the article, is that we don´t want to merge specific column values into ONE new column, but multiple pairs of column values into MULTIPLE combined columns.

Desired structure

As we want to consolidate columns and create a “longer” dataset, we need to turn to UNPIVOT.

Starting Point

We have one row for each child.

Starting point

Desired result

We want one row for each guardian of a child.

Desired result

SQL Syntax

Let´s start building the UNPIVOT syntax step by step.

Step 1

To accomplish the desired result, we need to start the UNPIVOT clause with the two new columns we will be using to consolidate values from the two “groups of columns”

  • guardianname (for the values from guardian1name & guardian2name
  • guardianemail (for the values of guardian1email & and guardian2email)

Step 2

Then we continue with the new category column, ie. guardiantype.

Step 3

Finally, we separate the two old column sets that belong to the same person (i.e. guardian1name & guardian1email as well as guardian2name & guardian2email) and give them an alias that will be used as a value in the new category column guardiantype.

Final SQL


--- Creation of our Dataset
WITH
contactList AS (
SELECT
'Lynn Evans' AS childName,
'Annika Henricsson' AS guardian1Name,
'Steve Evans' AS guardian2Name,
'annika@hi.com' AS guardian1Email,
'stevie@ho.com' AS guardian2Email
UNION ALL
SELECT
'Steph Andersson' AS childName,
'Anders Bloom' AS guardian1Name,
'Stephan McDonalds' AS guardian2Name,
'andersa@fun.com' AS guardian1Email,
'Stephane@him.com' AS guardian2Email )

--- Select Statement with UNPIVOT
SELECT
*
FROM
contactlist
UNPIVOT
(
(guardianname, guardianemail) --- Step 1 in UNPIVOT
FOR guardiantype IN --- Step 2 in UNPIVOT
(
(guardian1name, guardian1email) AS 'Main', --- Step 3 in UNPIVOT
(guardian2name,guardian2email) AS 'Secondary' --- Step 3 in UNPIVOT
)
)

What About Pivoting on Multiple Columns?

What if we instead have data shaped like the UNPIVOT output above and want to PIVOT it? We can use the general PIVOT syntax with a few small adjustments.

SQL Syntax

Step 1

As PIVOT requires an aggregation function as the first argument, we need to provide this, even though we are not aggregating any numbers in this use case. Hence, we can use MAX() or MIN() on our columns guardianName and guardianEmail and keep the names they already have.

Step 2

Next, we define the column to separate on, i.e. the one we will create new columns from. This is the column guardianType.

Step 3

Finally, we list the values from the column we will separate on. We can also give the values new aliases if needed.

In our situation, we can actually not achieve the exact same starting point we had before UNPIVOT: ing. The reason is that the values from the column we separate on — or the aliases we choose for them — can only be added to the end of the column names from Step 1 and the separator will be an aunderline.

In other words, the column names guardianName_Main and guardianName__1 are possible, but note guardian1Name.

Final SQL

--- Creation of our Dataset
WITH
contactList AS (
SELECT
'Lynn Evans' AS childName,
'Annika Henricsson' AS guardianName,
'annika@hi.com' AS guardianEmail,
'Main' AS guardianType
UNION ALL
SELECT
'Lynn Evans' AS childName,
'Steve Evans' AS guardianName,
'stevie@ho.com' AS guardianEmail,
'Secondary' AS guardianType
UNION ALL
SELECT
'Steph Andersson' AS childName,
'Anders Bloom' AS guardianName,
'andersa@fun.com' AS guardianEmail,
'Main' AS guardianType
UNION ALL
SELECT
'Steph Andersson' AS childName,
'Stephan McDonalds' AS guardianName,
'Stephane@him.com' AS guardianEmail,
'Secondary' AS guardianType )

---Select Statement with PIVOT
SELECT
*
FROM
contactList
PIVOT
(
MAX(guardianName) AS guardianName, --- Step 1 in PIVOT
MAX(guardianEmail) AS guardianEmail --- Step 1 in PIVOT
FOR guardianType IN --- Step 2 in PIVOT
(
'Main' AS _1, --- Step 3 in PIVOT
'Secondary' AS _2 --- Step 3 in PIVOT
)
)

WrapUp

We have now:

a) gone through the general PIVOT and UNPIVOT syntax in BigQuery SQL
b) unpivoted and pivoted datasets with multiple pairs of related columns into multiple combined columns

[1] In the R tidyverse library collection, pivot is conveniently named pivot_wider() and unpivot pivot_longer().

--

--