Google Cloud Functions Python Overview and Data Processing Example
Event driven serverless functions-as-a-service
Serverless, FaaS (Functions-as-a-Service), and Python are important knowledge areas for anyone building or utilizing cloud services in 2019. Cloud Functions are a lightweight managed service that you can use to increase agility while using cloud. You will want to consider Cloud Functions for new architectures or when modernizing existing workflows using multiple cloud platform services.
This article intended to get you acclimated with Google Cloud Functions and demonstrate some practical usage with Cloud Functions Python runtime.
Cloud Functions are serverless, so you run code without having to worry about managing or scaling servers. Cloud Functions integrate easily with Google Cloud Platform (GCP) services and you pay for resources only when your code runs. They are invoked by triggers that you specify and they standby waiting for specific events or HTTP endpoint calls.
Serverless does not mean that you take existing code chop it up into functions to receive a lower cost and instant autoscaling. Adopting a serverless approach means utilizing managed services and allowing your provider to handle base functionality of your service or application then use Cloud Functions to act like the glue between those services to deliver business value.
Cloud Functions support microservices architectures, data management pipelines, and allow you to easily integrate AI into applications. As this article will further demonstrate, they can act like glue pulling together multiple services in Google Cloud Platform to deliver a service, build insights through ML, or help data flow to BigQuery.
Since Cloud Functions execute in about 100ms, they can enable you can have a near real time streaming pipelines. They should be quick snippets of code that can fail easily if necessary. The main prerequisite is that you have a moderate skill level at either node.js or Python. In this article, we will focus on the Python runtime.
There are two different types of Cloud Functions: HTTP functions and background functions. HTTP functions are triggered by a HTTP trigger endpoint; and, background functions are triggered by event triggers from GCP services.
What can you use Cloud Functions for?
- Application backend tasks- respond to events from within your cloud infrastructure.
Example: Message published to Cloud Pub/Sub, then kick off secondary workflow.
- Data processing- Run code in response to changes in data.
Example: File uploaded to Google Cloud Storage, then update metadata repository in Cloud SQL.
- Integrating AI into applications- Integrate Google Cloud ML APIs to classify images, analyze videos, convert speech to text.
Example: Run Natural Language API to detect sentiment on support desk ticket summaries in a CSV uploaded to Google Cloud Storage.
Execution Environment and Dependencies
- Cloud Functions Python runtime is based on Python 3.7.1, as of the writing this article.
- The Cloud Functions service uses an execution environment based on Ubuntu 18.04, as of writing this article.
- Your functions will be contained in main.py
- Your function can use 3rd party libraries and dependencies and should be included in requirements.txt which is shipped with your function, one line per library:
- When you deploy your function the Cloud Functions service downloads and installs dependencies declared in requirements.txt using pip.
- If you need to use a specific version of pip, wheel, or any standard Python packages make sure to specify if in your requirements.txt file.
Calling Cloud Functions
Triggers determine how and when the function executes. When you deploy a cloud function you must select a trigger that will invoke your code. These are the unique events that you will identify and plan for actions to occur after.
Event Trigger Types
- Cloud Pub/Sub
- Cloud Pub/Sub and Cloud Scheduler
- Cloud Storage
- Cloud Firestore (beta as of writing this article)
- Firebase authentication (beta as of writing this article)
- Analytics for Firebase (beta as of writing this article)
- Realtime Database (beta as of writing this article)
- Remote Config (beta as of writing this article)
You can get creative with GCP services native functionally to further extend triggers with services that support Pub/Sub or any service that provides webhooks.
Make a trigger from any service that supports Cloud Pub/Sub
Because Cloud Functions can be invoked by message sent to a Pub/Sub topic, its easy to integrate Cloud Functions with other Google services that support Pub/Sub. For example you can export Stackdriver Cloud Logging to a Cloud Pub/Sub topic as a sink destination, once a message is posted to that Pub/Sub topic you can execute a function. This could be interesting for integrating with a service such a PagerDuty for staying ahead of critical logging notifications.
Gmail Push Notifications can be configured to send a message to a Pub/Sub topic when there are changes in Gmail inboxes. Are you using a Gmail inbox to manage invoices or activity for a department? Now you can kick off secondary workflows with Cloud Functions based upon events (messages received) in Gmail.
Example — Cloud Storage AVRO/CSV Append to BigQuery
One example I like for data processing workflows is using a Cloud Function to perform serverless data warehouse updates. I have only been able to find this scenario done in node.js by Asa Harland, so here it is for you in Python.
The use case: Say you have a Python proficient data science team working in Jupyter notebooks or Cloud Datalab. Your organization has a data warehouse in BigQuery that you are working on via notebooks. You need fresh updated data as the sample set you are working on is static and hasnt been updated in awhile. By using Cloud Storage as your source, a Cloud Function in between, and BigQuery as your destination you have a basic data pipeline to give you fresh data for your analysis and insights.
Using the object.finalize trigger whenever a new CSV or AVRO is uploaded to a specified Cloud Storage bucket the Cloud Function then uses the BigQuery API to append rows to the specified table in the function code.
Here is how the architecture looks for this data processing pipeline:
Here is the function code to append new CSVs or AVROs in Cloud Storage to a BigQuery table:
- Make sure to replace UPDATE_DATASET_HERE and UPDATE_TABLE_HERE in line 1 with the BQ dataset and table name you wish to have your function append rows into.
- Make sure the CSV or AVRO you are uploading has the first row as your column header (field names) matching the destination table in BQ.
Now, with your function set on your source bucket whenever a new CSV delta is uploaded the rows in that CSV will be automatically appended to the table you specified.
The goal of this article is to help you understand serverless options such as Cloud Functions when building out new modern architectures on Google Cloud Platform. Today there are many managed options that can give you different types of agility when building a new application or services. Cloud functions are the glue code that can make your managed service backed applications or workflows more efficient and insightful.