Run shell commands and orchestrate Compute Engine VMs with Cloud Workflows
Automate the execution of shell commands in a fully serverless and secure way without managing private keys. What a joy for a Cloud Architect to work with.
Sometimes, as part of a workflow process, it is necessary to connect to a VM in order to perform different tasks. We are going to cover connecting to a Linux VM and performing different tasks like copying files, or a script to import/update a database by using Cloud Workflows.
This article covers how to define Cloud Workflow to start a VM machine, connect to the Compute Engine shell, execute shell commands securely with the use of Cloud Build and Identity-Aware Proxy.
There are automation processes that don’t fit Cloud Run stack as they need access to the disk for persistence or large file handling, and they should be carried out on a VM, this could be turned on only for the duration of the task only. We were in one of these situations with USPTO Trademark Search API, where we had to process files daily, we didn’t need the machine for something else, as we have the rest of the architecture on serverless.
Turning on the VM and running the task and automating it via Cloud Workflows have changed how we run this now. This article is the result of combining workflows with VM shell commands to process daily files.
Note: To get started with Cloud Workflows, check out my introductory presentation about: Cloud Workflows and previous articles.
Problem definition
Cloud Workflows lets you define pipelines and orchestrate steps using HTTP based services. If we want to run a command inside a Linux VM, we need to have secure unassisted SSH access.
So the two biggest challenges that we need to resolve first:
- How to gain secure unassisted SSH access to run a command?
- How to trigger running the command by REST API?
The latter needs a bit of clarification. As Cloud Workflows lets you connect to HTTP-based services — the biggest challenge to run a shell command from a Workflow is calling an HTTP endpoint to trigger the SSH tunnel — and then execute the command.
There are open-source libraries such as shell2http to expose shell via the web, but this requires installing additional tools to the VM, and we want to avoid that. Thus it has some security concerns if not configured correctly, and enterprises prohibit installing such tools.
Unassisted SSH via IAP (Identity-Aware Proxy)
In the Google Cloud Platform ecosystem, there are various ways to connect to a Linux VM, either by using Cloud Shell, or the gcloud
command-line utility. The gcloud
utility has the ability to connect to a Linux VM either by public ssh port 22 or by using IAP (Identity-Aware Proxy).
# ssh into vm
gcloud compute ssh $INSTANCE_NAME --zone $ZONE --tunnel-through-iap
What is IAP (Identity-Aware Proxy)?
IAP lets you establish a central authorization layer for applications accessed by HTTPS or TCP, so you can use an application-level access control model instead of relying on network-level firewalls.
IAP’s TCP forwarding feature lets you control who can access administrative services like SSH and RDP on your backends from the public internet. The TCP forwarding feature prevents these services from being openly exposed to the internet. Instead, requests to your services must pass authentication and authorization checks before they get to their target resource.
You can find out more from the official documentation Identity-Aware Proxy. https://cloud.google.com/iap/docs
Using Cloud Build to run gcloud
CLI tool
We are going to use Cloud Build service to run the shell command as builds can be triggered by REST API. This is exactly what we are looking for as Cloud Workflow can incorporate as a step inside our pipeline. The build configuration is done with JSON syntax.
What is Cloud Build?
Cloud Build is a service that executes your builds on Google Cloud Platform’s infrastructure. Cloud Build can import source code from a variety of repositories or cloud storage spaces, execute a build to your specifications, and produce artifacts such as Docker containers or Java archives.
You can find out more about Cloud Build on the official documentation. https://cloud.google.com/cloud-build/docs/overview
Project Architecture
Now that we are aware of IAP and Cloud Build, let’s see how it helps us connect securely to a Linux VM. The below steps are highly inspired by my fellow Romanian Gabriel Hodoroaga’s article about How to connect securely from Cloud Build to VMs using Identity-Aware Proxy
Workflow definition
Let’s assume our VM needs to be started/stopped, it could be a regular VM or a preemptible. Our workflow definition will have multiple steps. Once we start the VM we need to wait a minute to boot up. As we know Cloud Builds can be triggered by REST API that’s our next step. We have defined a build based on gcr.io/google.com/cloudsdktool/cloud-sdk
container that has the gcloud
utility built-in. As part of the Cloud Build definition, we will run the shell exec. As Cloud Builds are async calls, we will have a workflow step to poll and wait until finished, which also means our shell commands have finished. The shell command output will be under Cloud Build logs.
Cloud Workflow to Start/Stop a Compute Engine VM
This workflow is extremely simple, using the Compute REST API, we have the op
variable setup to either start
or stop
the Compute Engine VM. Authentication is built into Cloud Workflows we don’t need to deal with that, only the relevant permissions need to be added so Workflows can manage Compute Engine VMs.
Cloud Workflow to launch a Cloud Build operation
CloudBuildCreate
sub-workflow it has nothing special other than getting the project
and build
params and submit it to the Cloud Build API. We also use a try/except
block to surface any results that cause our build to fail. These errors might be in Workflow Execution logs.
The build input param is defined in the invoker step which may look like:
This is the core of our Cloud Workflow definition.
What you see here is that we define workflow's main entrypoint calling the Cloud Build steps. The build
variable is set up based on the Cloud Build API input.
The container we are launching is gcr.io/google.com/cloudsdktool/cloud-sdk
this already contains the gcloud
command-line utility out of the box, so we don’t need to create or maintain a container image with these tools.
We are defining our container instance entrypoint to /bin/sh
and we have the args
setup to execute the -c “command”
we want to run in the shell.
gcloud compute ssh ${_INSTANCE_NAME} — zone ${_ZONE} — tunnel-through-iap
This is where the
gcloud
utility will connect via the IAP tunnel using the service account defined withroles/iap.tunnelResourceAccessor
. On top of this, there is also a firewall entry to let SSH access from IAP proxy to our VMs. The latter is important as otherwise will attempt and fail to connect to our instance.
The command that gets executed:
— command=\”touch ~/wf_$(date +\”\”%Y_%m_%d_%H_%M_%S\”\”).log\n”
This command is a basic sample one to let you run a hello world
example, but you can literally use here many options:
- It can use
SCP
to copy files from Cloud Storage,or from container source, or from internet - It can execute a script to download an XML file for processing and to launch DB import
- It can trigger any script
There is one final workflow step left, to wait for Cloud Build operation to finish then stop the VM.
As you see the CloudBuildWaitOperation
subworkflow checks for the operation every 10 seconds to see if the done
flag has been set. For avoiding unhandled failures we have a hard stop when reaching 100 iterations.
Having this wait step is important and allows us to use Cloud Workflows sequencing, as once it finished, we can advance to the next step which in our final example is stopping the VM.
The shell command output is piped to Cloud Build logging, so to debug or troubleshot for errors check the latest Cloud Build logs. Also, there are custom params to use a different bucket for Cloud Build, you could leverage and use those in the build definition.
Authorization, authentication.
We haven’t got much into permission details, as that is explained in the linked article. But here is a quick summary:
- Enable IAP API
- To allow IAP to connect to your VM instances, create a firewall rule to allow ingress traffic from IAP (IP Range
35.235.240.0/20
) for TCP port 22. - For the
service account
used by Cloud Build, in order to be able to transfer the managed keys you need to add roles:roles/compute.instanceAdmin.v1
,roles/compute.viewer
,roles/iam.serviceAccountUser
- Also, you need to add
roles/iap.tunnelResourceAccessor
to grant the cloud build service account permission to use IAP.
Note: the above permissions are for Service Account used by the Cloud Build command. Default is project_id@cloudbuild.gserviceaccount.com
To execute the Cloud Workflow, you can trigger by API or by Cloud Scheduler. If you set up via the Cloud Scheduler you need to specify a service account that has the roles/workflows.invoker
role and roles/compute.instanceAdmin.v1
to be able to start the VMs.
Conclusion
I hope that automating the execution of shell commands in a fully serverless and secure way without managing private keys by using Cloud Workflows gave you an overview of what can be done to automate this process.
No maintenance of SDK tools, no updates to libraries, all managed with enterprise security. What a joy for a Cloud Architect to work with.
To deploy your workflow, you need the source YAML file, which’s at the end of the article. You can deploy using Cloud Console, by API, or with gcloud
command-line utility.
We recommend using VSCode as there you can set up the GCP Project Switcher extension, and also to define IDE tasks to automate, deploy, execute, and describe execution.
Wrap Up
In the meantime, if you want to check it out, here are some links
- Using Cloud Workflows to load Cloud Storage files into BigQuery
- Workflows overview
- Workflows docs
- Quickstarts
- Sample VSCode/.tasks.json file for deploying to Workflows
- [video] Serverless Orchestration and Automation with GCP Workflows
Feel free to reach out to me on Twitter @martonkodok or read my previous posts on medium/@martonkodok
Complete YAML workflow definition.