ETL with Dataflow & BigQuery

Suraj Mishra
Analytics Vidhya
Published in
4 min readAug 30, 2021

--

Extract, Transform and Load using Dataflow & BigQuery

Originally published at https://asyncq.com

Table Of Contents

  • Introduction
  • Use Case
  • What are our options?
  • Conclusion

Introduction

Google BigQuery provides storage & compute for all sizes of data.
For storing input data it stores it in columnar format which is called “capacitor” and stored in the “colossus” file system.
We can execute a data transformation query inside BigQuery and BigQuery will execute our query within seconds. But for some use cases, it makes sense to perform data transformation outside BigQuery using BigData tools such as Spark or Apache Beam which provides compute to the data stored inside BigQuery.

In this blog, we will see how can we perform data transformation outside BigQuery using Apache Beam SDK and Dataflow as an execution engine.
If you want to learn about how to perform transformation inside BigQuery using UDF here is the Blog: https://medium.com/analytics-vidhya/using-npm-library-in-google-bigquery-udf-8aef01b868f4

Use Case

--

--