GCP and Prefect Cloud — from Docker Container to Cloud VM on Google Compute Engine
Repository template with GitHub Actions will deploy your first Python dataflows to Google Cloud in minutes
--
⚠️ Note: I no longer work at Prefect. This post may be completely outdated. Refer to Prefect docs and website to stay up-to-date.
Google Compute Engine allows running virtual machines (VMs) on GCP. It’s a scalable platform for running a wide range of workloads, including custom (Python) applications, data processing, and machine learning — an ideal execution layer for Prefect flows. This post will help you get started with Prefect and Google Cloud by leveraging the prefect-cloud-gcp repository template.
Create a new repository
First, create a new repository from the prefect-cloud-gcp template. Then, inspect the repository structure. It consists of the following:
flows
— example flows to get started with Prefect, both executing workflows locally from your machine, as well as remotely on GCPprefect_utils
module as a placeholder for any custom business logic and reusable codesetup.py
andrequirements.txt
are used to packageprefect_utils
and other dependencies for both local and remote executionDockerfile
combines your flow code, custom logic, and external dependencies into a portable deployable format.github
directory includes all GitHub actions and workflows to deploy flows and underlying remote infrastructure, and provides CI/CD pipelines to build a robust engineering process for your deployments.
Now that you are familiar with what this repository consists of, let’s get hands-on and deploy the entire project to Prefect and Google Cloud.
Prefect Cloud: create a workspace & API key
If you don’t have a Prefect Cloud account yet, you can sign up for a free account (aka “Freefect” 😎) using the app.prefect.cloud. From here, create your first workspace (or an organization) and your API key. Check the Cloud getting started documentation for more information.
Install Prefect and run flows locally
The easiest way to familiarize yourself with Prefect is to run your first flows. Clone the repository to your local or Cloud IDE (e.g., Google Cloud Shell, as shown in the section below) or work on it directly from GitHub Codespaces.
Install Prefect
With a single command, you can install Prefect and all other dependencies needed for this demo:
pip install -e .
Now, log in to Prefect Cloud from your CLI:
prefect cloud login -k YOUR_API_KEY
This command will prompt you to select your workspace (you can have many of them to separate different environments and teams). Once you do that, your development setup is ready, and you can start building your flows.
Hello-world flow
Let’s run a hello-world flow:
python flows/hello.py
You can observe that flow-run from the Prefect Cloud UI.
📓 Prefect makes debugging easy. You can run your flows from a local terminal and still observe their execution state in the Prefect Cloud UI. Environment parity between your local (or cloud) IDE and remote execution in the production cloud environment is finally painless.
GitHub Actions: Prefect secrets
So far, you authenticated your local terminal with Prefect Cloud — it’s time to do the same with your GitHub repository.
You can retrieve the PREFECT_API_KEY
and PREFECT_API_URL
by running the following command in your local (already authenticated) terminal:
prefect config view --show-secrets
You should see PREFECT_API_KEY
and PREFECT_API_URL
. Add those as GitHub Actions secrets. To do that, go to your repository Settings, then Secrets → Actions → New Repository Secret. Add those as shown here:
GCP: create a project & launch Cloud Shell
This section assumes you’ve already signed up for the Google Cloud Platform and have a GCP project. If not, follow this guide.
To interact with your GCP project programmatically (from Prefect Cloud and GitHub Actions), you need to create a service account. You can do that from the Google Cloud console, Google Cloud Shell, or from a local terminal authenticated with the gcloud
CLI. Google Cloud Shell is the easiest option because it has a preconfigured and authenticated terminal and a built-in Cloud IDE.
Activate Cloud Shell as follows:
This will open a terminal. Even though you could create the service account from that CLI, let’s open the editor to make things easier to navigate.
This will provide a cloud development environment similar to GitHub Codespaces shown before:
To verify gcloud
from here, run:
gcloud services list --enabled --limit 5
This will prompt you to authorize the CLI — confirm with “Authorize”.
Then, run the same command again to validate the CLI is working — if so, this should display five services enabled in your project.
Now, run the following bash commands from that Cloud Shell terminal to create a service account (customize any names based on your needs, especially the project name):
# Create GCP account + project => here we use project named "prefect-community" - replace it with your project name
# This will also set default project and region:
export CLOUDSDK_CORE_PROJECT="prefect-community"
export CLOUDSDK_COMPUTE_REGION=us-east1
export GCP_AR_REPO=prefect
export GCP_SA_NAME=prefect
# enable required GCP services:
gcloud services enable iamcredentials.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable run.googleapis.com
gcloud services enable compute.googleapis.com
# create service account named e.g. prefect:
gcloud iam service-accounts create $GCP_SA_NAME
export MEMBER=serviceAccount:"$GCP_SA_NAME"@"$CLOUDSDK_CORE_PROJECT".iam.gserviceaccount.com
gcloud projects add-iam-policy-binding $CLOUDSDK_CORE_PROJECT --member=$MEMBER --role="roles/run.admin"
gcloud projects add-iam-policy-binding $CLOUDSDK_CORE_PROJECT --member=$MEMBER --role="roles/compute.instanceAdmin.v1"
gcloud projects add-iam-policy-binding $CLOUDSDK_CORE_PROJECT --member=$MEMBER --role="roles/artifactregistry.admin"
gcloud projects add-iam-policy-binding $CLOUDSDK_CORE_PROJECT --member=$MEMBER --role="roles/iam.serviceAccountUser"
# create JSON credentials file as follows, then copy-paste its content into your GHA Secret + Prefect GcpCredentials block:
gcloud iam service-accounts keys create prefect.json --iam-account="$GCP_SA_NAME"@"$CLOUDSDK_CORE_PROJECT".iam.gserviceaccount.com
This will generate a JSON key file, which will appear in the Cloud editor. Open that JSON file and copy its entire content. It should have the following format:
{
"type": "service_account",
"project_id": "prefect-community",
"private_key_id": "uuid",
"private_key": "-----BEGIN PRIVATE KEY-----\n a looooooong string",
"client_email": "prefect@prefect-community.iam.gserviceaccount.com",
"client_id": "numbers",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/sa_name_and_project_name.iam.gserviceaccount.com"
}
Then, paste this file’s content into the following:
1) GitHub Action secret named GCP_CREDENTIALS
:
2) Prefect GCP Credentials
block named default
:
With that, we are all set and ready for Cloud deployment.
GitHub Actions: Plan of Action
The “Getting Started” GitHub action does the following:
- It creates an Artifact Registry repository if one doesn’t exist yet.
- It builds a Docker image and pushes it to that Artifact Registry repository based on the
Dockerfile
- It deploys one VM (if one such VM with the same name already exists, it gets deleted before a new VM gets created — to create several VMs, run this action multiple times and set different VM and queue names) and a Docker container running a Prefect agent process that deploys flow runs. By default, the flows are configured to be deployed as serverless containers using Google Cloud Run jobs. This makes it easy to scale your project as your needs grow — no need to monitor and maintain the underlying infrastructure — serverless containers get spun up based on the provided Artifact Registry image, and the resource allocation can be adjusted any time on the
CloudRunJob
block, even from the Prefect UI. - It automatically deploys your first Prefect blocks
- It automatically deploys your first Prefect flows
GitHub Actions: Action!
Let’s run the “Getting Started“ workflow. Once executed, everything you need to run your flows remotely on Google Cloud will be provisioned for you.
As described in the image above, go to Actions, select the “All-in-one” workflow, and run it. You can directly modify any default values, for instance, you can pick a bigger instance or run the VM in another region.
If everything goes smoothly, you should see a similar screen as demonstrated below. If some action succeeded, but you see error annotations, don’t worry — some actions are configured to, e.g., delete a stale resource if one already exists. During a first run, no resource would need to be redeployed (i.e., deleted before creating a new one), hence the error annotation.
Prefect Cloud resources we’ve just provisioned
It may take up to 5 minutes for the GitHub Action to finish. After that, the VM may need another 5 minutes to start the agent container. Once all that finishes, you should see that your Prefect queue running on Google Cloud is ready to deploy flow runs.
Apart from that queue, you can validate that new GitHub
and CloudRunJob
blocks have been created for you:
You can cross-check that the GitHub
block points to the same repository from which you triggered that action, and that the CloudRunJob
block has been tagged with your latest Git commit SHA.
Lastly, you should see several deployments corresponding to all flows in your repository:
🔒 Add a Personal Access Token if your GitHub repo is private
Note that the GitHub block provisioned as part of this GitHub Action assumes that you use a public repository. If you want to leverage a private repository, you would need to add an access token to your GitHub block. This previous blog post demonstrates how to do that.
Google Cloud resources we’ve just provisioned
Let’s investigate the resources that we’ve just provisioned. First, if you go to your Artifact Registry, you should see a new container image tagged with both Git commit SHA and latest flags:
Next, type VM in your search bar to validate the VM instance is running:
You can SSH to that instance from your browser window — we’ll do that to validate the execution of flow runs.
Trigger first flow runs
All resources are provisioned. Now we can trigger a couple of flow runs. To do that from the terminal, you can use the following:
prefect deployment run marvin/default
prefect deployment run hello/default
prefect deployment run quote/default
prefect deployment run parametrized/default
prefect deployment run maintenance/default
And to trigger a custom run from the UI, go to the Deployments page, select the parametrized deployment and click on “Custom run":
From here, set your name (or the name of someone else you want to greet from Prefect) and click “Run”:
Soon, you should see several flow runs executed on Google Cloud. The run that we triggered from the UI should give you similar log output:
Inspect the logs on the Google Compute Engine VM
Let’s also SSH to the VM instance and cross-check the execution there.
In this terminal, type:
docker ps # find the Prefect container ID
docker logs CONTAINER_ID
The output should look similar to the following:
With that, congrats! 🎉 You’ve completed the initial setup on Prefect and Google Cloud.
CI/CD pipeline for new flows
You may have noticed that we haven’t deployed any flows manually — the automated setup did it for us.
If you want to add more flows, add those to the flows
folder and commit the changes. The CI/CD pipeline will automatically create deployments for those flows. For instance, let’s add a flow named new.py
to the flows
directory, and then commit and push this change to the main
branch:
from faker import Faker
from prefect import flow
@flow(log_prints=True)
def new():
fake = Faker()
print(f"Let's {fake.bs()} 🚀")
if __name__ == "__main__":
new()
The CI/CD pipeline ensures that all flow deployments are always up-to-date, versioned with your Git commit SHA, and deployed to your Prefect Cloud workspace:
Next steps
This was a demo on how to get started with Prefect and Google Cloud using a repository template with preconfigured GitHub Actions.
If anything discussed in this post is unclear, feel free to tag me when asking a question in the Prefect Community Slack or Prefect Discourse.
Thanks for reading, and happy engineering!