GCS to BigQuery via Cloud Composer: Part 1 (Overview)

Amandeep Saluja
2 min readNov 23, 2023

Konnichiwa 👋

As part of our GCS to BigQuery Pipeline via Different GCP Services project, we will be using Cloud Composer (Apache Airflow) to process our Excel File.

In this article, we will focus on understanding the flow of our process. So, lets get started :)

Technologies Used

  1. GCP Services
    - BigQuery
    - Cloud Functions
    - Cloud Storage
    - Cloud Composer (Apache Airflow)
    - Workload Identity Federation
  2. GitHub Actions
  3. Python
  4. Terraform

ETL Flow

Okay. Lets see what we are trying to do here.

Step 1: We will be dropping Excel files to a GCS bucket.

Step 2: The Excel file dropped will trigger a Cloud Function to trigger an Apache Airflow DAG.

Step 3: Once the DAG gets triggered, it goes through 3 steps:

Step 3.1: XLSX to CSV cloud function is triggered which we developed in this post.

Step 3.2: Reads the CSV file which is the output of Step 3.1.

Step 3.3: Now that we have the CSV available, the data will be loaded to BigQuery.

Now that we have our steps laid out, lets jump into the development. We will do it in 3 steps:

  1. Overview (this post)
  2. Setup Cloud Composer Environment
  3. Create and Deploy Airflow DAG
  4. Create Cloud Function to Trigger Airflow DAG

Sayōnara 👋

--

--