How to work with Arrays and Structs in Google BigQuery
If you ever get confused how to select or how to create Arrays or Structs in BigQuery then you are at the right place. Arrays and Structs are confusing, and I won’t argue on that. But before I come to the confusing part, let me first tell you a little bit about what exactly each of them is and how they are different.
What are Arrays and how are they used in BigQuery:
Arrays in BigQuery, like in any other language, are a collection of elements of the same data type.
For example, this is what an Array address_history might look like:
id:”1",
name:”abc”,
age:”20",
address_history: [“current”, “previous”, “birth”]

What are Structs and how are they used in BigQuery:
Struct is a data type which has attributes in key value pairs, just like a dictionary in Python.
Within each record, multiple attributes have their own values. These attributes can either be referred to as keys or Struct columns.
id:”1",
name:”abc”,
age:”20",
address_history: {
“status”:”current”,
“address”:”London”,
“postcode”:”ABC123D”
}

Structs can have more attributes, each with its own value, related to one key/ID. For example, in below image, row 1 has 3 attributes (“status”, “address”,”postcode”) within one address_history Struct.

Whereas Arrays can have multiple elements within one column address_history, against each key/ID, there is no pair in Arrays, it is basically a list or a collection.
address_history: [“current”, “previous”, “birth”]
# Note the braces, Structs have curly braces {} and Arrays have square braces []
What are Array of Structs and how can we use them in BigQuery:
As the name suggests, if we want to store multiple Structs against each key/ID, Array of Structs is the option.
For example: Address_history is an Array column having 3 {} Structs inside [] .
id:”1",
name:”abc”,
age:”20",
address_history: [
{ “status”:”current”, “address”:”London”, “postcode”:”ABC123D” },
{ “status”:”previous”, “address”:”New Delhi”, “postcode”:”738497" },
{ “status”:”birth”, “address”:”New York”, “postcode”:”SHI747H” }
]

Note: You can use the following code which is at the end, in sql file, for you to copy and try.
How do I know by looking at the schema if it is Array/Struct/Array of Structs:
An Array is a list, which means it has nested values. If you ever see a data type with “REPEATED” mode in the schema, then it is an Array.


A Struct on the other hand, has many values and if we want to select one value, we need to use ‘dot’. If you ever find a data type as “RECORD” in schema, then it is a Struct with Nullable mode.


An Array of Structs is a nested record. Thus, “RECORD” data type with “REPEATED” mode, then it is an Array of Structs.


Now you know the difference by looking at the schema or data.
How do I query array/struct/array_of_structs?
An Array is a nested column, if we want a grouped value then we don’t need to unnest it. But to flatten it into multiple records, we need to use unnest:


As you can see above, if you don’t unnest after the from keyword and select the Array column as is, the result of the query will be only one row.
But here in the picture below, after unnest of address_history, the output is that BQ has flattened the rows into three.


Struct, being the Record data type, doesn’t need to be unnested.


As above, address_history, a Struct data type, is selected directly and it resulted in three columns. But if you want to select partial values from the Struct data type, you can do that by using “.” such as address_history.status


In case of Array of Structs, column can be selected directly and it will result in only one row:


As above, there is only one row with multiple values for each Struct key. For example, address_history.status has three values [“current”, “previous”, “birth”]. But if you need to select partial Struct keys, you definitely need to unnest first to flatten it into multiple rows, otherwise BQ will throw this error:
Cannot access field status on a value with type ARRAY<STRUCT<status STRING, address STRING, postcode STRING>>

Only unnested Array of Structs (Record, Repeated) will result in multiple rows with all Struct key value pairs.


You can also select few columns from Array of Structs by using unnest and selecting those particular columns with “.”


For nested Structs such as Arrays having a Struct inside another Struct, use multiple unnests. Here is the basic example of an Array having a Struct within another Struct such as (Array[Struct<columns,Struct<>]).


This is another example of an Array having another Array and Struct within Struct such as (Array[Struct<Struct<>,Array[]>]).


Now you may have further questions, so keep on reading.
How do I create the schema for Arrays and Structs:
Here is a simple example of all kinds of Arrays and Structs data type that can be included in the schema’s DDL:


Here is the SQL file for you to try in BigQuery. Happy Querying :)
For more reference, head over to these: