Pay attention to cardinality & grain when UNNESTING in BigQuery!

Photo by Pawel Czerwinski on Unsplash

Whenever you’re UNNESing an ARRAY, you’re getting a Cartesian product between the row and the array contents. If you were to unnest another array, you’ll get another Cartesian product between the output of the previous unnest and the elements in the current array.

Let’s look at an example. A student has their grades stored in an ARRAY as well as their food allergies in another ARRAY.

If we are to UNNEST both array we’ll end having count_of_grades x count_of_allergies rows for each student, 4x3 in this case.

Why this happens? Well the allergies and grades have no relationship between each other, they just refer to the same student row.

Take this into account when you’re working with nested data.

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

Originally published at https://datawise.dev on April 19, 2024.

--

--

Constantin Lungu
Data Engineer’s Notes

Software Engineer with a focus on Analytics — Data Engineering, Data Science & Business Intelligence | Opinions are my own