XLSX to CSV GCP Cloud Function

Amandeep Saluja
7 min readOct 31, 2023

Hola 👋

As discussed in GCS to BigQuery Pipeline via Different GCP Services post, I will be working on different GCP services to build ETL/ELT pipelines to transfer data in Excel to BigQuery.

But before we do that and start using Google Provided templates, we need to convert the Excel to CSV. Why? Because we will be using Dataproc Serverless (Apache Spark) and Dataflow Google Provided Template (Apache Beam) in 2 ETL pipelines. And these templates do not provide Excel files as input.

So, lets start building the script. We will be using below technologies for this service:

  1. GCP Services
    - Cloud Functions
    - Workload Identity Federation
  2. GitHub Actions
  3. Python
  4. Terraform

As I have mentioned earlier (many times), we will be using Workload Identity Federation as a mechanism to authenticate GitHub with GCP. Please make sure to set it up. With that being said, lets start the development process.

Folder Structure

Below is how my repo is structured:

📦xlsx-to-csv-gcp-http-cloud-function
┣ 📂.github
┃ ┗ 📂workflows
┃ ┃ ┣ 📜deploy.yml
┃ ┃ ┗ 📜linter.yml # for testing purpose
┣ 📂infra
┃ ┣ 📜main.tf
┃ ┣ 📜providers.tf
┃ ┗ 📜variables.tf
┣ 📂src
┃ ┣ 📜helpers.py
┃ ┣ 📜main.py
┃ ┗ 📜requirements.txt
┗ 📜README.md
  • infra folder contains Terraform files which is used to create…

--

--