Integrating Google Sheets data into Azure Databricks Lakehouse with Logic Apps — Part 2

Vanderson Gonçalves
3 min readJan 18, 2024

--

Photo by Rubaitul Azad on Unsplash

In the first part of this story, I’ve told you about how to create a App Logic to ingest data from Google Sheets into an Azure Storage Account, to be consumed in an Azure Databricks Lakehouse. In the second part, I intend to explain how to create a pipeline in Data Factory to orquestrate the execution of this Logic App.

Data Factory

Azure Data Factory is a fully managed, serverless data integration service, provided by Microsoft Azure. This service supports various data sources, enabling users to create, schedule, and manage data pipelines for extracting, transforming, and loading data.

Create a global parameter for url_logic_app

To streamline CI/CD processes, I’ve created the global parameter url_logic_app_google_sheet, designed to store the URL from the Logic App. This allows for dynamic URL changes, enhancing flexibility across different environments such as production and UAT.

Parameterize the pipeline

While developing our pipelines, allways good pay attention to good practices, I for example, usually adopt the principles of DataOps. One crucial aspect of implementing DataOps is the parameterization of pipelines. It’s an important feature for ensuring the robust execution of pipelines under varying conditions. To implement this, I’ve utilized a set variable with a JSON expression, outlining the the parameters to be passed to the previously created Logic App.

In this case, I’ll pass the following parameters to the pipeline through the set variable:

{"parameters": 
{"file_name":"1yAXNP1x1ty9mhLo4ju0HToy9YurNoErUTRNTK7dGWJo",
"sheet_name":"sheet1",
"schema_sink": "gsheet",
"table_name": "table1"}
}

You can pass these parameters to the Logic App in various ways, such as reading from a file, Azure SQL, etc. Essentially, this capability allows me to run the pipeline for ingesting any Google Sheet used within my company, utilizing the same pipeline.

Call the Logic App

The Logic App is triggered by an HTTP POST request. In the Data Factory pipeline, I’ve added a Web Activity to make this call, utilizing the global parameter url_logic_app_google_sheet for the URL.

In the body sent to the Logic App, I included the following content:

{"file_name":"@{json(variables('parameters'))['parameters']['file_name']}",
"schema_sink":"@{json(variables('parameters'))['parameters']['schema_sink']}",
"table_sink":"@{json(variables('parameters'))['parameters']['table_name']}",
"worksheet_name":"@{json(variables('parameters'))['parameters']['sheet_name']}"}

Pipeline Execution

On the Azure Logic App page, you can view the run history of our app. Here, you can debug and correct any problems.

The Logic App returns a count_row that can be used to log the execution of this pipeline. Even though, I haven’t create activities for logging, it’s a good practice to be implement.

Conclusion

While I would have preferred a direct connection from Data Factory to Google Sheets, utilizing Logic Apps has proven to be a viable and effective solution for our use cases.

References

--

--