Google Cloud Function Beyond the Limits

Effective utilization of Cloud Function using BigQuery computing power

Google Cloud Functions is a serverless execution environment designed for building and connecting cloud services. It enables fully managed infrastructure, so that cloud users need to focus on application code without worrying about underlying hardware and software.

Use Case

  1. Design a data processing pipeline to load data from Cloud Storage to BigQuery
  2. Perform data extract and loading with basic transformations, as-soon-as new file(s) arrived into a Cloud Storage bucket
  3. Input file size can vary from 1 MB to 1000 MB

Challenges

Although, there are several cloud services which can suffice the purpose, but the fine-grained, on-demand nature of Cloud Function makes it an easy choice. Having said that, there are few restrictions made it challenging to carve-out an optimal solution. Few examples depicted below:

Cloud Function Data Loading Stats

Solution

In this blob, the implemented solution addressed the challenges of loading large files which can’t fit Cloud Function limits. The hallmark of this
approach is to utilize all the imperative features of Cloud Function and capitalizing compute power of BigQuery. Following steps explains how it is accomplished:

  1. Cloud Function invoked on arrival of new files into Cloud Storage bucket
  2. A BigQuery external table created using ExternalConfig() pointing to Cloud Storage bucket file(s)
  3. BigQuery insert statement is used to read data from external table and load it to BigQuery transaction/master table

Cloud Function Source Code

Cloud Function source code split into multiple files:

  1. Schema File (schema.json): This file contains BigQuery external table column names and data-types.
  2. Configuration File (config.json): This file contains BigQuery external and internal table input parameters.
  3. Library File (bqlib.py): This file contains user-defined library functions to perform data extractions and data loading.
  4. Entry Point/Main File (bqDataLoad.py): This file contains Cloud Function entry point code, which invokes library functions
Schema File
Configuration File
Library File
Entry Point/Main File

Test Result

The test result illustrated below demonstrates the successful loading of a 681.2 MB file using Cloud Function.

Cloud Function Properties
Cloud Function Job Execution Stats
Memory Profiler (Python)

Source Code Repository

https://github.com/soumendra-mishra/bigquery-data-loader.git

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Soumendra Mishra

Soumendra Mishra

117 Followers

Passionate Leader, Technology Enthusiast, Innovator, and Mentor