ETL with Dataflow & BigQuery
Extract, Transform and Load using Dataflow & BigQuery
Originally published at https://asyncq.com
Table Of Contents
- Introduction
- Use Case
- What are our options?
- Conclusion
Introduction
Google BigQuery provides storage & compute for all sizes of data.
For storing input data it stores it in columnar format which is called “capacitor” and stored in the “colossus” file system.
We can execute a data transformation query inside BigQuery and BigQuery will execute our query within seconds. But for some use cases, it makes sense to perform data transformation outside BigQuery using BigData tools such as Spark or Apache Beam which provides compute to the data stored inside BigQuery.
In this blog, we will see how can we perform data transformation outside BigQuery using Apache Beam SDK and Dataflow as an execution engine.
If you want to learn about how to perform transformation inside BigQuery using UDF here is the Blog: https://medium.com/analytics-vidhya/using-npm-library-in-google-bigquery-udf-8aef01b868f4