Google Cloud Function Beyond the Limits

Effective utilization of Cloud Function using BigQuery computing power

Soumendra Mishra
Google Cloud - Community
2 min readAug 20, 2020

--

Google Cloud Functions is a serverless execution environment designed for building and connecting cloud services. It enables fully managed infrastructure, so that cloud users need to focus on application code without worrying about underlying hardware and software.

Use Case

  1. Design a data processing pipeline to load data from Cloud Storage to BigQuery
  2. Perform data extract and loading with basic transformations, as-soon-as new file(s) arrived into a Cloud Storage bucket
  3. Input file size can vary from 1 MB to 1000 MB

Challenges

Although, there are several cloud services which can suffice the purpose, but the fine-grained, on-demand nature of Cloud Function makes it an easy choice. Having said that, there are few restrictions made it challenging to carve-out an optimal solution. Few examples depicted below:

Cloud Function Data Loading Stats

Solution

In this blob, the implemented solution addressed the challenges of loading large files which can’t fit Cloud Function limits. The hallmark of this
approach is to utilize all the imperative features of Cloud Function and capitalizing compute power of BigQuery. Following steps explains how it is accomplished:

  1. Cloud Function invoked on arrival of new files into Cloud Storage bucket
  2. A BigQuery external table created using ExternalConfig() pointing to Cloud Storage bucket file(s)
  3. BigQuery insert statement is used to read data from external table and load it to BigQuery transaction/master table

Cloud Function Source Code

Cloud Function source code split into multiple files:

  1. Schema File (schema.json): This file contains BigQuery external table column names and data-types.
  2. Configuration File (config.json): This file contains BigQuery external and internal table input parameters.
  3. Library File (bqlib.py): This file contains user-defined library functions to perform data extractions and data loading.
  4. Entry Point/Main File (bqDataLoad.py): This file contains Cloud Function entry point code, which invokes library functions
Schema File
Configuration File
Library File
Entry Point/Main File

Test Result

The test result illustrated below demonstrates the successful loading of a 681.2 MB file using Cloud Function.

Cloud Function Properties
Cloud Function Job Execution Stats
Memory Profiler (Python)

Source Code Repository

https://github.com/soumendra-mishra/bigquery-data-loader.git

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Soumendra Mishra
Soumendra Mishra

Written by Soumendra Mishra

Passionate Leader, Technology Enthusiast, Innovator, and Mentor