My BigQuery Stories-working with JSON data

Bihag Kashikar
intelia
Published in
3 min readJan 14, 2024

In this post of my BigQuery stories, I am sharing some use cases and techniques for working with JSON data in BigQuery. These are very common requirements for data analysts and engineers when exploring data of type JSON.

1. First up on the list is the use cases is using with offset in the JSON functions

Let's say you have below JSON data, a very standard structure of JSON, and a typical use case for customer data wherein, there are multiple array addresses with an element in the JSON showing one being current and other’s being previous addresses.

Now, here are a few challenges that I have always faced working with this type of data — For address standardization, data compliance, and data quality-related use cases, I had to work with every element of array of addresses. Here’s what I did-

with jsondata as (
SELECT
JSON '{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"","numberOfYears":""}]}' as data
) SELECT addrs, _index
FROM jsondata,
unnest(json_query_array(jsondata.data,'$.addresses')) addrs with offset _index
;

Output:

with offset gave away the key type index values which came in real handy to work with individual types of address elements.

2. The second use case is the grouping of JSON values of common meaning

Consider below JSON data below showing players performances for league and country and I want to group Names and GA (goals and assists) comma-delimited values.

WITH DATA AS (
SELECT
""" { "stats": { "epl": [{"Year":2023,
"playername":"Marcus Rashford",
"GA":"33"},
{"Year":2023,
"playername":"Bruno Fernandes",
"GA":"25"},
{"Year":2023,
"playername":"Alexandro Garnacho",
"GA":"10"} ],
"country": [{"Year":2022,
"playername":"Marcus Rashford",
"GA":"10"},
{"Year":2022,
"playername":"Bruno Fernandes",
"GA":"31"} ] } } """ AS _json )
SELECT
ARRAY_TO_STRING( ARRAY(
SELECT
JSON_EXTRACT_SCALAR(x,"$.playername")
FROM
DATA, UNNEST(JSON_EXTRACT_ARRAY(_json,"$.stats.epl")) AS x ),"," ) AS playername,
ARRAY_TO_STRING( ARRAY(
SELECT
JSON_EXTRACT_SCALAR(x,"$.GA")
FROM
DATA, UNNEST(JSON_EXTRACT_ARRAY(_json,"$.stats.epl")) AS x ),"," ) AS GA

Output:

3. Finally, my third use case is parsing raw data in JSON element values

Consider the JSON data below. Typically all descriptiontext values will always have text that requires escaping in JSON— the ‘\` is included in the MMA\Boxing and the text is in the quoted string too.

'{"name": "Jonny", "descriptiontext":"\"The customer has opted for MMA\\Boxing classes at EarlyLearningStage\\option\""}'

One way to parse the description text type fields is to use it the r’ expression. Here’s the query-

SELECT
JSON_VALUE(DATA, '$.descriptiontext') AS age
FROM (
SELECT
r'{"name": "Jonny", "descriptiontext":"\"The customer has opted for MMA\\Boxing classes at EarlyLearningStage\\option\""}' AS DATA ) a

Output:

Hope the above examples are useful in your journey of working with BigQuery.

--

--

Bihag Kashikar
intelia
Editor for

Google Cloud, Data analytics, Data architecture, System design @ intelia Pty. Ltd