JSON in Google BigQuery — Finally

Still in preview, the feature is promising but has limitations when it comes to data engineering production use cases.

Zach Quinn
Pipeline: Your Data Engineering Resource

--

Big(Query) News for Data Engineering Teams

Last week, Google quietly revealed a preview of a JSON data type that is compatible with BigQuery. For data engineering teams working with messy and unstructured columns, this is a big deal. In exploring BigQuery’s compatibility (or lack thereof) with JSON I stumbled upon StackOverFlow posts from 5–6 years ago that proposed workarounds to BigQuery’s lack of JSON support. Often programmatically and resource-taxing, these solutions involved advanced SQL knowledge in the form of a pUDF (persistent user defined function) or multi-step data type conversion.

Photo by Taylor Vick on Unsplash

Including a native JSON data type means that data engineers can avoid a manual conversion of JSON strings to a data type that was close but not equivalent to data stored as JSON objects. Up until this point, to store JSON data in BigQuery the choices were fairly limited. Data either had to be stored in a STRING, ARRAY or STRUCT format. When loading JSON into BigQuery from an external data source like an API, it always had to have an accompanying schema, which could either be auto-detected or explicitly generated. Google…

--

--