GCP — Cloud Workflows — Orchestrate in declarative way
Cloud workflows was announced as GA since Jan 2021 and it’s almost a year since the product is in GA status. Cloud workflows is one of the new toolkits added in the orchestration options available in GCP.
Cloud composer has been a major contender in GCP platform for any pipeline orchestration requirements. Cloud Composer V1 came with need of provisioning and managing an entire PaaS platform. This also came with added cost of running the composer platform 24/7. The emergence of Composer V2 has made lot of the management aspects simpler.
This blog focuses on the detailed overview of Cloud Workflows, comparison to Cloud Composer and some use-cases where cloud workflows fits and provides a better alternative.
Introduction
Cloud Workflows is configuration driven (YAML/JSON) serverless orchestration tool.
Each workflow in cloud workflows can be considered sequence of steps.
Each step within the steps represents action that needs to be performed.
The step in the workflow can perform execution calls, control steps (iteration, conditional execution..), error handling and call other sub workflows
Cloud Workflows provides integration with GCP services (Connectors), services in On-prem or other cloud by means of HTTP execution calls.
Cloud Workflows can have optional Cloud Scheduler triggers to execute on a scheduled basis or can react to an event supported by Event Arc Triggers
Get started — Creation and Execution
Workflows require YAML/JSON configuration to be provided and can be created via one of the below
Cloud Console — Inline Editor
gcloud commands
Terraform (IaaC — Infrastructure as code)
As an example, we will use cloud workflows to load json file from GCS bucket to BigQuery table and refresh a materialized view over the base table once the load is complete.
Implicitly the steps are sequentially executed as per the order of declaration.
The workflow can be deployed using gcloud command as below
gcloud workflows deploy wf_bq_load_process --location us-central1 --source=workflow.yaml
Once the workflow is successfully deployed, it will be visible in the workflows console.
The workflow shows the detailed metrics, logs, trigger and sources which are configured
The workflow can be executed from console or using gcloud command as below
gcloud workflows execute wf_bq_load_process --location us-central1
The execution can be checked in cloud console as below
Type of operations
Call Step:
The heart of cloud workflows is the call step. The call step can be leveraged to call API endpoints.
The call step is categorized into 3 kind of calls
1. API Endpoints (External — On-Prem/Other Clouds) or Connectors (for interacting with GCP services)
2. Call to sub-workflows
3. Call to system defined functions like sys.log for logging the activities
Control Flows
Cloud workflows can be considered as programs which are written in declarative ways. The control flow steps helps in different functions like looping, decision, skipping steps, dependencies or calling another sub-workflow/workflow
Error Handling Flows
Cloud workflows handles the errors in execution by means of (try, except and raise steps)
Variables and Params (Inter step communication)
The communication between steps can be achieved by use of variables (assign step) or run time arguments (params) for workflow or sub-workflow.
The variables can be used to store the responses from the API calls and take next steps accordingly.
Comparison with Cloud Composer
Cloud composer comes with a very rich set of operators, sensors and has wide integrations with many of internal and external GCP products.
Use Cases for Cloud Workflows
Typical use-cases for cloud workflows include
1. Event driven orchestrations — Typically in case of event-driven system designs, cloud workflows provides a very light weight means to connect the services and orchestrate the flow
2. Orchestration with human intervention — Cloud workflows provides the features to implement callback that can wait for response for api endpoint, usually used for operations having manual approval steps.
3. Light Weight Orchestration— For organisations with light weight orchestration requirements for data pipelines, where cloud composer billing might be a concern
4. Process Automation
5. Real Time processing
References
Workflow Execution — Python Client Reference https://cloud.google.com/python/docs/reference/workflows/latest
Gcloud CLI Reference for workflows — https://cloud.google.com/sdk/gcloud/reference/workflows
ML implementation with Cloud Workflows — https://cloud.google.com/community/tutorials/ml-pipeline-with-workflows
EventArc and Cloud Workflows
https://cloud.google.com/community/tutorials/eventarc-workflows-integration
Cloud Workflows pricing
https://cloud.google.com/workflows/pricing
Cloud Composer pricing
https://cloud.google.com/composer/pricing
Cloud Composer vs Cloud Workflows
https://cloud.google.com/workflows/docs/choose-orchestration#detailed-feature-comparison