How to work with Arrays and Structs in Google BigQuery

Deepti Garg
Feb 12 · 6 min read

If you ever get confused how to select or how to create Arrays or Structs in BigQuery then you are at the right place. Arrays and Structs are confusing, and I won’t argue on that. But before I come to the confusing part, let me first tell you a little bit about what exactly each of them is and how they are different.

What are Arrays and how are they used in BigQuery:

Arrays in BigQuery, like in any other language, are a collection of elements of the same data type.

For example, this is what an Array address_history might look like:

id:”1",
name:”abc”,
age:”20",
address_history: [“current”, “previous”, “birth”]

Image for post
Image for post

What are Structs and how are they used in BigQuery:

Struct is a data type which has attributes in key value pairs, just like a dictionary in Python.

Within each record, multiple attributes have their own values. These attributes can either be referred to as keys or Struct columns.

id:”1",
name:”abc”,
age:”20",
address_history: {
“status”:”current”,
“address”:”London”,
“postcode”:”ABC123D”
}

Structs can have more attributes, each with its own value, related to one key/ID. For example, in below image, row 1 has 3 attributes (“status”, “address”,”postcode”) within one address_history Struct.

Image for post
Image for post

Whereas Arrays can have multiple elements within one column address_history, against each key/ID, there is no pair in Arrays, it is basically a list or a collection.

address_history: [“current”, “previous”, “birth”]

# Note the braces, Structs have curly braces {} and Arrays have square braces []

What are Array of Structs and how can we use them in BigQuery:

As the name suggests, if we want to store multiple Structs against each key/ID, Array of Structs is the option.

For example: Address_history is an Array column having 3 {} Structs inside [] .

id:”1",
name:”abc”,
age:”20",
address_history: [
{ “status”:”current”, “address”:”London”, “postcode”:”ABC123D” },
{ “status”:”previous”, “address”:”New Delhi”, “postcode”:”738497" },
{ “status”:”birth”, “address”:”New York”, “postcode”:”SHI747H” }
]

Image for post
Image for post

Note: You can use the following code which is at the end, in sql file, for you to copy and try.

How do I know by looking at the schema if it is Array/Struct/Array of Structs:

An Array is a list, which means it has nested values. If you ever see a data type with “REPEATED” mode in the schema, then it is an Array.

Image for post
Image for post
Create array_demo
Image for post
Image for post
array_demo schema

A Struct on the other hand, has many values and if we want to select one value, we need to use ‘dot’. If you ever find a data type as “RECORD” in schema, then it is a Struct with Nullable mode.

Image for post
Image for post
Create struct_demo
Image for post
Image for post
struct_demo schema

An Array of Structs is a nested record. Thus, “RECORD” data type with “REPEATED” mode, then it is an Array of Structs.

Image for post
Image for post
create array_of_struct_demo
Image for post
Image for post
array_of_struct_demo schema

Now you know the difference by looking at the schema or data.

How do I query array/struct/array_of_structs?

An Array is a nested column, if we want a grouped value then we don’t need to unnest it. But to flatten it into multiple records, we need to use unnest:

Image for post
Image for post
Image for post
Image for post

As you can see above, if you don’t unnest after the from keyword and select the Array column as is, the result of the query will be only one row.

But here in the picture below, after unnest of address_history, the output is that BQ has flattened the rows into three.

Image for post
Image for post
Image for post
Image for post

Struct, being the Record data type, doesn’t need to be unnested.

Image for post
Image for post
Image for post
Image for post

As above, address_history, a Struct data type, is selected directly and it resulted in three columns. But if you want to select partial values from the Struct data type, you can do that by using “.” such as address_history.status

Image for post
Image for post
Image for post
Image for post

In case of Array of Structs, column can be selected directly and it will result in only one row:

Image for post
Image for post
Image for post
Image for post

As above, there is only one row with multiple values for each Struct key. For example, address_history.status has three values [“current”, “previous”, “birth”]. But if you need to select partial Struct keys, you definitely need to unnest first to flatten it into multiple rows, otherwise BQ will throw this error:

Cannot access field status on a value with type ARRAY<STRUCT<status STRING, address STRING, postcode STRING>>

Image for post
Image for post

Only unnested Array of Structs (Record, Repeated) will result in multiple rows with all Struct key value pairs.

Image for post
Image for post
Image for post
Image for post

You can also select few columns from Array of Structs by using unnest and selecting those particular columns with “.”

Image for post
Image for post
Image for post
Image for post

For nested Structs such as Arrays having a Struct inside another Struct, use multiple unnests. Here is the basic example of an Array having a Struct within another Struct such as (Array[Struct<columns,Struct<>]).

Image for post
Image for post
Image for post
Image for post

This is another example of an Array having another Array and Struct within Struct such as (Array[Struct<Struct<>,Array[]>]).

Image for post
Image for post
Image for post
Image for post

Now you may have further questions, so keep on reading.

How do I create the schema for Arrays and Structs:

Here is a simple example of all kinds of Arrays and Structs data type that can be included in the schema’s DDL:

Image for post
Image for post

Google Cloud - Community

Google Cloud community articles and blogs

Deepti Garg

Written by

Data Engineer

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Deepti Garg

Written by

Data Engineer

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store