My BigQuery Stories-working with JSON data
In this post of my BigQuery stories, I am sharing some use cases and techniques for working with JSON data in BigQuery. These are very common requirements for data analysts and engineers when exploring data of type JSON.
1. First up on the list is the use cases is using
with offset
in the JSON functions
Let's say you have below JSON data, a very standard structure of JSON, and a typical use case for customer data wherein, there are multiple array
addresses with an element
in the JSON showing one being current
and other’s being previous
addresses.
Now, here are a few challenges that I have always faced working with this type of data — For address standardization, data compliance, and data quality-related use cases, I had to work with every element of array
of addresses. Here’s what I did-
with jsondata as (
SELECT
JSON '{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"","numberOfYears":""}]}' as data
) SELECT addrs, _index
FROM jsondata,
unnest(json_query_array(jsondata.data,'$.addresses')) addrs with offset _index
;
Output:
with offset
gave away the key
type index values which came in real handy to work with individual types of address elements.
2. The second use case is the grouping of JSON values of common meaning
Consider below JSON data below showing players performances for league and country and I want to group Names
and GA
(goals and assists) comma-delimited values.
WITH DATA AS (
SELECT
""" { "stats": { "epl": [{"Year":2023,
"playername":"Marcus Rashford",
"GA":"33"},
{"Year":2023,
"playername":"Bruno Fernandes",
"GA":"25"},
{"Year":2023,
"playername":"Alexandro Garnacho",
"GA":"10"} ],
"country": [{"Year":2022,
"playername":"Marcus Rashford",
"GA":"10"},
{"Year":2022,
"playername":"Bruno Fernandes",
"GA":"31"} ] } } """ AS _json )
SELECT
ARRAY_TO_STRING( ARRAY(
SELECT
JSON_EXTRACT_SCALAR(x,"$.playername")
FROM
DATA, UNNEST(JSON_EXTRACT_ARRAY(_json,"$.stats.epl")) AS x ),"," ) AS playername,
ARRAY_TO_STRING( ARRAY(
SELECT
JSON_EXTRACT_SCALAR(x,"$.GA")
FROM
DATA, UNNEST(JSON_EXTRACT_ARRAY(_json,"$.stats.epl")) AS x ),"," ) AS GA
Output:
3. Finally, my third use case is parsing
raw
data inJSON element values
Consider the JSON data below. Typically all descriptiontext
values will always have text that requires escaping in JSON— the ‘\`
is included in the MMA\Boxing
and the text is in the quoted string too.
'{"name": "Jonny", "descriptiontext":"\"The customer has opted for MMA\\Boxing classes at EarlyLearningStage\\option\""}'
One way to parse the description text type fields is to use it the r’
expression. Here’s the query-
SELECT
JSON_VALUE(DATA, '$.descriptiontext') AS age
FROM (
SELECT
r'{"name": "Jonny", "descriptiontext":"\"The customer has opted for MMA\\Boxing classes at EarlyLearningStage\\option\""}' AS DATA ) a
Output:
Hope the above examples are useful in your journey of working with BigQuery.