GCP and Prefect Cloud — from Docker Container to Cloud VM on Google Compute Engine

Repository template with GitHub Actions will deploy your first Python dataflows to Google Cloud in minutes

Anna Geller
The Prefect Blog
Published in
10 min readDec 19, 2022

--

Little baby Marvin slowly getting acquainted with Prefect and Google Cloud

⚠️ Note: I no longer work at Prefect. This post may be completely outdated. Refer to Prefect docs and website to stay up-to-date.

Google Compute Engine allows running virtual machines (VMs) on GCP. It’s a scalable platform for running a wide range of workloads, including custom (Python) applications, data processing, and machine learning — an ideal execution layer for Prefect flows. This post will help you get started with Prefect and Google Cloud by leveraging the prefect-cloud-gcp repository template.

Create a new repository

First, create a new repository from the prefect-cloud-gcp template. Then, inspect the repository structure. It consists of the following:

  1. flows — example flows to get started with Prefect, both executing workflows locally from your machine, as well as remotely on GCP
  2. prefect_utils module as a placeholder for any custom business logic and reusable code
  3. setup.py and requirements.txt are used to package prefect_utils and other dependencies for both local and remote execution
  4. Dockerfile combines your flow code, custom logic, and external dependencies into a portable deployable format
  5. .github directory includes all GitHub actions and workflows to deploy flows and underlying remote infrastructure, and provides CI/CD pipelines to build a robust engineering process for your deployments.

Now that you are familiar with what this repository consists of, let’s get hands-on and deploy the entire project to Prefect and Google Cloud.

Prefect Cloud: create a workspace & API key

If you don’t have a Prefect Cloud account yet, you can sign up for a free account (aka “Freefect” 😎) using the app.prefect.cloud. From here, create your first workspace (or an organization) and your API key. Check the Cloud getting started documentation for more information.

Install Prefect and run flows locally

The easiest way to familiarize yourself with Prefect is to run your first flows. Clone the repository to your local or Cloud IDE (e.g., Google Cloud Shell, as shown in the section below) or work on it directly from GitHub Codespaces.

GitHub Codespaces setup to run your flows in GitHub's Cloud IDE

Install Prefect

With a single command, you can install Prefect and all other dependencies needed for this demo:

pip install -e .
Install Prefect and all other required dependencies

Now, log in to Prefect Cloud from your CLI:

prefect cloud login -k YOUR_API_KEY

This command will prompt you to select your workspace (you can have many of them to separate different environments and teams). Once you do that, your development setup is ready, and you can start building your flows.

Hello-world flow

Let’s run a hello-world flow:

python flows/hello.py

You can observe that flow-run from the Prefect Cloud UI.

📓 Prefect makes debugging easy. You can run your flows from a local terminal and still observe their execution state in the Prefect Cloud UI. Environment parity between your local (or cloud) IDE and remote execution in the production cloud environment is finally painless.

GitHub Actions: Prefect secrets

So far, you authenticated your local terminal with Prefect Cloud — it’s time to do the same with your GitHub repository.

You can retrieve the PREFECT_API_KEY and PREFECT_API_URL by running the following command in your local (already authenticated) terminal:

prefect config view --show-secrets

You should see PREFECT_API_KEY and PREFECT_API_URL. Add those as GitHub Actions secrets. To do that, go to your repository Settings, then Secrets → Actions → New Repository Secret. Add those as shown here:

GitHub Actions secrets configuration

GCP: create a project & launch Cloud Shell

This section assumes you’ve already signed up for the Google Cloud Platform and have a GCP project. If not, follow this guide.

To interact with your GCP project programmatically (from Prefect Cloud and GitHub Actions), you need to create a service account. You can do that from the Google Cloud console, Google Cloud Shell, or from a local terminal authenticated with the gcloud CLI. Google Cloud Shell is the easiest option because it has a preconfigured and authenticated terminal and a built-in Cloud IDE.

Activate Cloud Shell as follows:

Activate Google Cloud Shell terminal from the GCP console

This will open a terminal. Even though you could create the service account from that CLI, let’s open the editor to make things easier to navigate.

Open Editor to get an interactive Cloud IDE directly from the Google console

This will provide a cloud development environment similar to GitHub Codespaces shown before:

Running commands in Cloud Shell IDE

To verify gcloud from here, run:

gcloud services list --enabled --limit 5

This will prompt you to authorize the CLI — confirm with “Authorize”.

Make sure to authorize Google to run commands with the gcloud CLI

Then, run the same command again to validate the CLI is working — if so, this should display five services enabled in your project.

Now, run the following bash commands from that Cloud Shell terminal to create a service account (customize any names based on your needs, especially the project name):

# Create GCP account + project => here we use project named "prefect-community" - replace it with your project name
# This will also set default project and region:
export CLOUDSDK_CORE_PROJECT="prefect-community"
export CLOUDSDK_COMPUTE_REGION=us-east1
export GCP_AR_REPO=prefect
export GCP_SA_NAME=prefect

# enable required GCP services:
gcloud services enable iamcredentials.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable run.googleapis.com
gcloud services enable compute.googleapis.com

# create service account named e.g. prefect:
gcloud iam service-accounts create $GCP_SA_NAME
export MEMBER=serviceAccount:"$GCP_SA_NAME"@"$CLOUDSDK_CORE_PROJECT".iam.gserviceaccount.com
gcloud projects add-iam-policy-binding $CLOUDSDK_CORE_PROJECT --member=$MEMBER --role="roles/run.admin"
gcloud projects add-iam-policy-binding $CLOUDSDK_CORE_PROJECT --member=$MEMBER --role="roles/compute.instanceAdmin.v1"
gcloud projects add-iam-policy-binding $CLOUDSDK_CORE_PROJECT --member=$MEMBER --role="roles/artifactregistry.admin"
gcloud projects add-iam-policy-binding $CLOUDSDK_CORE_PROJECT --member=$MEMBER --role="roles/iam.serviceAccountUser"

# create JSON credentials file as follows, then copy-paste its content into your GHA Secret + Prefect GcpCredentials block:
gcloud iam service-accounts keys create prefect.json --iam-account="$GCP_SA_NAME"@"$CLOUDSDK_CORE_PROJECT".iam.gserviceaccount.com

This will generate a JSON key file, which will appear in the Cloud editor. Open that JSON file and copy its entire content. It should have the following format:

{
"type": "service_account",
"project_id": "prefect-community",
"private_key_id": "uuid",
"private_key": "-----BEGIN PRIVATE KEY-----\n a looooooong string",
"client_email": "prefect@prefect-community.iam.gserviceaccount.com",
"client_id": "numbers",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/sa_name_and_project_name.iam.gserviceaccount.com"
}

Then, paste this file’s content into the following:

1) GitHub Action secret named GCP_CREDENTIALS:

Add GCP credentials as a GitHub repository secret

2) Prefect GCP Credentials block named default:

Paste GCP credentials to a Prefect GCP credentials block

With that, we are all set and ready for Cloud deployment.

GitHub Actions: Plan of Action

The “Getting Started” GitHub action does the following:

  1. It creates an Artifact Registry repository if one doesn’t exist yet.
  2. It builds a Docker image and pushes it to that Artifact Registry repository based on the Dockerfile
  3. It deploys one VM (if one such VM with the same name already exists, it gets deleted before a new VM gets created — to create several VMs, run this action multiple times and set different VM and queue names) and a Docker container running a Prefect agent process that deploys flow runs. By default, the flows are configured to be deployed as serverless containers using Google Cloud Run jobs. This makes it easy to scale your project as your needs grow — no need to monitor and maintain the underlying infrastructure — serverless containers get spun up based on the provided Artifact Registry image, and the resource allocation can be adjusted any time on the CloudRunJob block, even from the Prefect UI.
  4. It automatically deploys your first Prefect blocks
  5. It automatically deploys your first Prefect flows

GitHub Actions: Action!

Let’s run the “Getting Started“ workflow. Once executed, everything you need to run your flows remotely on Google Cloud will be provisioned for you.

Run getting started GitHub Action to provision all required Google Cloud resources

As described in the image above, go to Actions, select the “All-in-one” workflow, and run it. You can directly modify any default values, for instance, you can pick a bigger instance or run the VM in another region.

Run in progress

If everything goes smoothly, you should see a similar screen as demonstrated below. If some action succeeded, but you see error annotations, don’t worry — some actions are configured to, e.g., delete a stale resource if one already exists. During a first run, no resource would need to be redeployed (i.e., deleted before creating a new one), hence the error annotation.

Finished GitHub Action workflow run

Prefect Cloud resources we’ve just provisioned

It may take up to 5 minutes for the GitHub Action to finish. After that, the VM may need another 5 minutes to start the agent container. Once all that finishes, you should see that your Prefect queue running on Google Cloud is ready to deploy flow runs.

Healthy Prefect work queue validates that the GCP setup is working as expected

Apart from that queue, you can validate that new GitHub and CloudRunJob blocks have been created for you:

GitHub block

You can cross-check that the GitHub block points to the same repository from which you triggered that action, and that the CloudRunJob block has been tagged with your latest Git commit SHA.

Lastly, you should see several deployments corresponding to all flows in your repository:

Prefect deployments

🔒 Add a Personal Access Token if your GitHub repo is private

Note that the GitHub block provisioned as part of this GitHub Action assumes that you use a public repository. If you want to leverage a private repository, you would need to add an access token to your GitHub block. This previous blog post demonstrates how to do that.

Google Cloud resources we’ve just provisioned

Let’s investigate the resources that we’ve just provisioned. First, if you go to your Artifact Registry, you should see a new container image tagged with both Git commit SHA and latest flags:

Artifact registry image

Next, type VM in your search bar to validate the VM instance is running:

Prefect VM on Google Compute Engine

You can SSH to that instance from your browser window — we’ll do that to validate the execution of flow runs.

Trigger first flow runs

All resources are provisioned. Now we can trigger a couple of flow runs. To do that from the terminal, you can use the following:

prefect deployment run marvin/default
prefect deployment run hello/default
prefect deployment run quote/default
prefect deployment run parametrized/default
prefect deployment run maintenance/default

And to trigger a custom run from the UI, go to the Deployments page, select the parametrized deployment and click on “Custom run":

Custom run from deployment

From here, set your name (or the name of someone else you want to greet from Prefect) and click “Run”:

Custom parametrized run

Soon, you should see several flow runs executed on Google Cloud. The run that we triggered from the UI should give you similar log output:

Flow run logs

Inspect the logs on the Google Compute Engine VM

Let’s also SSH to the VM instance and cross-check the execution there.

SSH to a Cloud VM

In this terminal, type:

docker ps # find the Prefect container ID
docker logs CONTAINER_ID

The output should look similar to the following:

Prefect agent logs on a Google Compute Engine VM

With that, congrats! 🎉 You’ve completed the initial setup on Prefect and Google Cloud.

CI/CD pipeline for new flows

You may have noticed that we haven’t deployed any flows manually — the automated setup did it for us.

Flows deployed as part of the initial setup

If you want to add more flows, add those to the flows folder and commit the changes. The CI/CD pipeline will automatically create deployments for those flows. For instance, let’s add a flow named new.py to the flows directory, and then commit and push this change to the main branch:

from faker import Faker
from prefect import flow


@flow(log_prints=True)
def new():
fake = Faker()
print(f"Let's {fake.bs()} 🚀")


if __name__ == "__main__":
new()

The CI/CD pipeline ensures that all flow deployments are always up-to-date, versioned with your Git commit SHA, and deployed to your Prefect Cloud workspace:

Regular CI/CD pipeline

Next steps

This was a demo on how to get started with Prefect and Google Cloud using a repository template with preconfigured GitHub Actions.

If anything discussed in this post is unclear, feel free to tag me when asking a question in the Prefect Community Slack or Prefect Discourse.

Thanks for reading, and happy engineering!

--

--

Anna Geller
The Prefect Blog

DevRel, Data Professional, Cloud & .py fan. www.annageller.com. Get my articles via email: https://annageller.medium.com/subscribe