Integrating Google Sheets data into Azure Databricks Lakehouse with Logic Apps - Part 1

Vanderson Gonçalves
4 min readDec 20, 2023

--

Photo by Rubaitul Azad on Unsplash

Despite the evolution of Data Analytics and the widespread adoption of new technologies for processing data, there are instances when users simply require a worksheet to input data and conduct analyses. In a recent project involving a Lakehouse on Databricks, we seamlessly ingested data from Excel sheets. However, when faced with the task of ingesting data from Google Sheets, we discovered that this was not a trivial job to be performed in Data Factory.

Since Data Factory lacks a connector to copy this data, an alternative solution becomes necessary. After some research, we found that Logic Apps can effectively bridge this gap. This article describes how to seamlessly integrate Logic Apps with Data Factory to ingest data into the Lakehouse. In this first part, I’ll guide you through the creation of the Logic App, and in the next part, I’ll address the orchestration of this pipeline through Azure Data Factory.

Data Ingestion Pipeline

Logic Apps

Azure Logic Apps is a cloud service designed to automate the execution of business processes. It empowers users to create workflows using a variety of connectors to interface with external services. In our case, we utilized it to establish a connection with Google Sheets and seamlessly write this data to our Storage Account.

Creating the Logic App

1) Trigger HTTP request

The initial step in the Logic App involves triggering ‘When a HTTP request is received.’ In this step, I added the parameter ‘Method’ with the value ‘POST’. The Request Body JSON Schema was then configured with the following content, specifying the arguments to be received by the Logic App.

{
"properties": {
"file_name": {
"type": "string"
},
"schema_sink": {
"type": "string"
},
"table_sink": {
"type": "string"
},
"worksheet_name": {
"type": "string"
}
},
"type": "object"
}

2) Setting the Variables

The subsequent steps involve initializing the variables that the Logic App will utilize:

3) Get Rows from Google Sheet

The next step is ‘Get Rows’ (Google Sheets), responsible for connecting to a Google Sheet and processing all rows in the imported worksheet. To use ‘Get Rows’, I must connect with a valid user who has access to the files and provide the ‘File’ and ‘Worksheet’ as parameters, which the Logic App receives. It’s crucial to note that for ‘Get Rows’ to function correctly, all columns in the worksheet must have a first row, which will be treated as the header. Additionally, a column named ‘__PowerAppsId__’ is dynamically added during the worksheet processing to store the unique identifier (rowid) for each row.

4) Write data in the Storage

The ‘Get Rows’ step generates an object called ‘Records values’ containing all the rows from the imported sheet. So, I added a ‘For Each’ step to iterate over this object and write each row to the storage as a JSON file.

To write to the storage, I incorporated a ‘Create Blob (V2)’ step. In this step, I specified the storage account name and utilized the following pattern for the Folder path in our project: /landing-zone/{type-source}/{schema}/{table_name}/{current_date}. For the Blob name, I employed the current timestamp along with the auto-generated column '__PowerAppsId__'. The last parameter in 'Create Blob' is 'Blob content’, where I provided 'Current item' from the 'Record values’, representing the sheet row in JSON format. Within the 'For Each' loop, I included an 'Increment variable' to count the rows.

5) Send Response

The final step of our Logic App is the ‘Response’. In this step, I included the ‘Content-Type’ header with the value ‘application/json’. Here, we can provide information to the requester. I’m utilizing it to return the number of rows imported. For that, I used the following expression in the ‘Body’:

concat('{"type": "object", 
"properties": {
"count_rows": "', variables('count_rows'), '"
}
}')

Conclusion of the First Part

In this first part, I presented the use case of importing data from Google Sheet spreadsheets to a Lakehouse on Databricks. I also demonstrated the steps to create a Logic App that connects to Google Sheet and writes its data to an Azure Storage Account. In the next part, I’ll explain how to orchestrate the execution of this pipeline in Data Factory.

References

--

--