Run shell commands and orchestrate Compute Engine VMs with Cloud Workflows

Márton Kodok
Dec 11, 2020 · 7 min read

Automate the execution of shell commands in a fully serverless and secure way without managing private keys. What a joy for a Cloud Architect to work with.

Cloud Workflows — automate the execution of shell commands on VMs

Sometimes, as part of a workflow process, it is necessary to connect to a VM in order to perform different tasks. We are going to cover connecting to a Linux VM and performing different tasks like copying files, or a script to import/update a database by using Cloud Workflows.

This article covers how to define Cloud Workflow to start a VM machine, connect to the Compute Engine shell, execute shell commands securely with the use of Cloud Build and Identity-Aware Proxy.

There are automation processes that don’t fit Cloud Run stack as they need access to the disk for persistence or large file handling, and they should be carried out on a VM, this could be turned on only for the duration of the task only. We were in one of these situations with USPTO Trademark Search API, where we had to process files daily, we didn’t need the machine for something else, as we have the rest of the architecture on serverless.

Turning on the VM and running the task and automating it via Cloud Workflows have changed how we run this now. This article is the result of combining workflows with VM shell commands to process daily files.

Note: To get started with Cloud Workflows, check out my introductory presentation about: Cloud Workflows and previous articles.

Problem definition

So the two biggest challenges that we need to resolve first:

  • How to gain secure unassisted SSH access to run a command?
  • How to trigger running the command by REST API?

The latter needs a bit of clarification. As Cloud Workflows lets you connect to HTTP-based services — the biggest challenge to run a shell command from a Workflow is calling an HTTP endpoint to trigger the SSH tunnel — and then execute the command.

There are open-source libraries such as shell2http to expose shell via the web, but this requires installing additional tools to the VM, and we want to avoid that. Thus it has some security concerns if not configured correctly, and enterprises prohibit installing such tools.

Unassisted SSH via IAP (Identity-Aware Proxy)

# ssh into vm
gcloud compute ssh $INSTANCE_NAME --zone $ZONE --tunnel-through-iap

What is IAP (Identity-Aware Proxy)?

IAP lets you establish a central authorization layer for applications accessed by HTTPS or TCP, so you can use an application-level access control model instead of relying on network-level firewalls.

IAP’s TCP forwarding feature lets you control who can access administrative services like SSH and RDP on your backends from the public internet. The TCP forwarding feature prevents these services from being openly exposed to the internet. Instead, requests to your services must pass authentication and authorization checks before they get to their target resource.

You can find out more from the official documentation Identity-Aware Proxy. https://cloud.google.com/iap/docs

Using Cloud Build to run gcloud CLI tool

What is Cloud Build?

Cloud Build is a service that executes your builds on Google Cloud Platform’s infrastructure. Cloud Build can import source code from a variety of repositories or cloud storage spaces, execute a build to your specifications, and produce artifacts such as Docker containers or Java archives.

You can find out more about Cloud Build on the official documentation. https://cloud.google.com/cloud-build/docs/overview

Project Architecture

Automate the execution of shell commands in a fully serverless and secure way

Workflow definition

Let’s assume our VM needs to be started/stopped, it could be a regular VM or a preemptible. Our workflow definition will have multiple steps. Once we start the VM we need to wait a minute to boot up. As we know Cloud Builds can be triggered by REST API that’s our next step. We have defined a build based on gcr.io/google.com/cloudsdktool/cloud-sdk container that has the gcloud utility built-in. As part of the Cloud Build definition, we will run the shell exec. As Cloud Builds are async calls, we will have a workflow step to poll and wait until finished, which also means our shell commands have finished. The shell command output will be under Cloud Build logs.

Cloud Workflow to Start/Stop a Compute Engine VM

This workflow is extremely simple, using the Compute REST API, we have the op variable setup to either start or stop the Compute Engine VM. Authentication is built into Cloud Workflows we don’t need to deal with that, only the relevant permissions need to be added so Workflows can manage Compute Engine VMs.

Cloud Workflow to launch a Cloud Build operation

CloudBuildCreate sub-workflow it has nothing special other than getting the project and build params and submit it to the Cloud Build API. We also use a try/except block to surface any results that cause our build to fail. These errors might be in Workflow Execution logs.

The build input param is defined in the invoker step which may look like:

This is the core of our Cloud Workflow definition.

What you see here is that we define workflow's main entrypoint calling the Cloud Build steps. The build variable is set up based on the Cloud Build API input.

The container we are launching is gcr.io/google.com/cloudsdktool/cloud-sdk this already contains the gcloud command-line utility out of the box, so we don’t need to create or maintain a container image with these tools.

We are defining our container instance entrypoint to /bin/sh and we have the args setup to execute the -c “command” we want to run in the shell.

gcloud compute ssh ${_INSTANCE_NAME} — zone ${_ZONE} — tunnel-through-iap

This is where the gcloud utility will connect via the IAP tunnel using the service account defined with roles/iap.tunnelResourceAccessor. On top of this, there is also a firewall entry to let SSH access from IAP proxy to our VMs. The latter is important as otherwise will attempt and fail to connect to our instance.

The command that gets executed:

— command=\”touch ~/wf_$(date +\”\”%Y_%m_%d_%H_%M_%S\”\”).log\n”

This command is a basic sample one to let you run a hello world example, but you can literally use here many options:

  • It can use SCP to copy files from Cloud Storage,or from container source, or from internet
  • It can execute a script to download an XML file for processing and to launch DB import
  • It can trigger any script

There is one final workflow step left, to wait for Cloud Build operation to finish then stop the VM.

As you see the CloudBuildWaitOperation subworkflow checks for the operation every 10 seconds to see if the done flag has been set. For avoiding unhandled failures we have a hard stop when reaching 100 iterations.

Having this wait step is important and allows us to use Cloud Workflows sequencing, as once it finished, we can advance to the next step which in our final example is stopping the VM.

The shell command output is piped to Cloud Build logging, so to debug or troubleshot for errors check the latest Cloud Build logs. Also, there are custom params to use a different bucket for Cloud Build, you could leverage and use those in the build definition.

Authorization, authentication.

  1. Enable IAP API
  2. To allow IAP to connect to your VM instances, create a firewall rule to allow ingress traffic from IAP (IP Range 35.235.240.0/20) for TCP port 22.
  3. For the service account used by Cloud Build, in order to be able to transfer the managed keys you need to add roles: roles/compute.instanceAdmin.v1, roles/compute.viewer, roles/iam.serviceAccountUser
  4. Also, you need to add roles/iap.tunnelResourceAccessor to grant the cloud build service account permission to use IAP.

Note: the above permissions are for Service Account used by the Cloud Build command. Default is project_id@cloudbuild.gserviceaccount.com

To execute the Cloud Workflow, you can trigger by API or by Cloud Scheduler. If you set up via the Cloud Scheduler you need to specify a service account that has the roles/workflows.invoker role and roles/compute.instanceAdmin.v1 to be able to start the VMs.

Conclusion

No maintenance of SDK tools, no updates to libraries, all managed with enterprise security. What a joy for a Cloud Architect to work with.

To deploy your workflow, you need the source YAML file, which’s at the end of the article. You can deploy using Cloud Console, by API, or with gcloud command-line utility.

We recommend using VSCode as there you can set up the GCP Project Switcher extension, and also to define IDE tasks to automate, deploy, execute, and describe execution.

Wrap Up

Feel free to reach out to me on Twitter @martonkodok or read my previous posts on medium/@martonkodok

Complete YAML workflow definition.

Google Cloud - Community

Google Cloud community articles and blogs

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Márton Kodok

Written by

Speaker at conferences, a Google Developer Expert top user on Stackoverflow, software architect at REEA.net, co-founder IT Mures, life-long learner, mentor

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.