How to use Kubernetes with Prefect

Part 1: Up and running

Published in

The Prefect Blog

8 min readNov 21, 2022

Kubernetes is a workhorse! 🐴 It’s a great way to declaratively scale containerized infrastructure up and down for long running processes. Kubernetes can also be a bear. 🐻 It has a lot of moving parts that can be cumbersome to run in harmony.

Prefect is awesome! 🚀 It lets you coordinate your workflows — running them on a schedule with automatic retries, caching, reusable configuration, a collaborative UI, and more. Prefect also provides observability into what’s happening across your data stack with automatic logging and notifications.

Prefect’s built-in Kubernetes support makes it easier to run your Python dataflow code at scale. You get all the that orchestration and observability goodness with scalable infrastructure for long-running processes. 🙂

If you aren’t familiar with Kubernetes (AKA K8s), check out my introductory posts on K8s concepts and commands.

If you aren’t familiar with Prefect v2, I suggest doing the Prefect tutorials and checking out the concept docs for flows, tasks, blocks, deployments, and code storage.

Why would you use K8s with Prefect?

If you’re already using Kubernetes to run your Python data engineering, ML training, or other backend workflows, you can benefit from all the observability and orchestration that Prefect provides.

If you’re already using Prefect and looking to scale up your code infrastructure with lots of customizability, Kubernetes is a great choice. For example, if want to run compute-heavy ML training processes, you might want to use K8s.

Using infrastructure

Prefect makes it easy for you to run your dataflow code in a variety of infrastructure. By default, your code runs in a local subprocess on your machine. Alternatively, you can create a deployment and specify the infrastructure for your code to run in. Prefect provides pre-built infrastructure blocks for integrating with Docker, Kubernetes, AWS ECS, GCP Cloud Run, and Azure Container Instances.

Diagram showing how Prefect deployments with flow runs work

As we’ll see in future articles, there are several ways you can use Kubernetes with Prefect. In this post we’ll get our feet wet with our agent and K8s infrastructure running locally, our Python flow code in AWS S3 remote storage, and Prefect Cloud as our orchestration engine.

Let’s do this! 🚀

Use the default Kubernetes Job infrastructure

We’ll start off with a basic setup and iterate in future posts.

Setup

Install Docker Desktop and enable K8s

Let’s run Kubernetes locally. If you don’t have K8s installed, I suggest using the version that ships with Docker Desktop.

If needed, download Docker Desktop for your operating system and fire it up. Then, enable Kubernetes. The Kubernetes menu is found by clicking on the gear icon in the top right of Docker Desktop.

docker desktop menu to enable kubernetes

Download and install Prefect

In your Python virtual environment, install the latest version of Prefect with pip install -U prefect or use the version in this post with pip install prefect==2.6.8.

Set up Prefect Cloud

If you haven’t connected your machine to Prefect Cloud before, make an API key by clicking on your account icon and creating a key.

Copy the command line code snippet that appears when you create your key. Run the snippet in your terminal to save your key to your local Prefect profile.

Create flow code

We just want to demonstrate that things are working as expected. Let’s use some basic code that logs information about the network and instance. 🙂

from prefect import flow, get_run_logger
from platform import node, platform

@flow
def check():
    logger = get_run_logger()
    logger.info(f"Network: {node()}. ✅")
    logger.info(f"Instance: {platform()}. ✅")

if __name__ == "__main__":
    check()

We’ll upload this code to our S3 bucket.

Create AWS S3 bucket

Create an AWS S3 bucket with the default settings.

Set IAM User

For this tutorial, you could leave your bucket unsecured, but that’s not a great practice. Instead, let’s use the AWS credentials of an IAM user with access to interact with S3.

If you don’t have one yet, create a user with S3 read and write permissions in the AWS console.

Make an access key for the new user. Then copy the Access key ID and Secret access key — you’ll use them when you create your Prefect S3 Block.

Create S3 remote storage block

You can create a Prefect block from the UI or Python code. Here’s how you can create an S3 block from the UI.

From the Blocks menu, click on the + button and select the S3 block.

Give the block a unique name and your bucket path. Then input your Access key ID and Secret access key that you created earlier.

Create your Deployment

Next we’ll build and apply our deployment from the command line. Alternatively, we could have defined our deployment in a Python file, as I showed here.

prefect deployment build flows.py:check -n k8sjob -sb s3/myawsblock  
-i kubernetes-job --override env.EXTRA_PIP_PACKAGES=s3fs -a

Let’s break this down. ⬇️

The Python flow code is found in the flows.py file. The entrypoint flow function in that file is check.

We named our deployment k8sjob.

We specified the S3 storage block we created above named prefect-k8s. All the files in the current local directory get uploaded to our S3 bucket.

The flow will run in the default kubernetes-job infrastructure, with the most recent Docker image, and a bunch of basic Kubernetes presets. We’ll see those presets in a minute.

Note that to use S3 with K8s, you need to use the override flag and pass the environment variable EXTRA_PIP_PACKAGES with s3fs. Otherwise, the container will not have the Python package it needs to grab your remote storage flow code from S3.

The -a tells Prefect to send the deployment info to the server. 🚀

Results

Let’s see the results of creating our deployment. In the UI, click on Deployments and then click on check/k8sjob. Then click on the link to the anonymous infrastructure block that looks like this:

You’ll see the details of the default K8s manifest.

We’ll look at how to adjust these fields in the next post. Follow me to make sure you don’t miss it! 🚀

A Kubernetes deployment will be created in the default name space. Note that a Kubernetes deployment and a Prefect deployment are different things. ⚠️

Start your agent

Your agent will run locally, on your machine, polling your Prefect Cloud default work queue. Here’s the command to fire it up:

prefect agent start -q 'default'

All the pieces are in place. Let’s run it! 🔥

Schedule a flow run

We’ll create an ad hoc flow run. We could use the UI or the CLI. Let’s use the CLI. Open another terminal window and run the following command:

prefect deployment run check/k8sjob

In the terminal running your agent you should see output that looks like this:

The Prefect adjective-animal name given to this flow run is super-rabbit. 🐇

It will take a little time for K8s to work its magic, and then the output should wrap up like this:

Woo hoo, we did it! 🎉

Let’s look more closely at what happened.

Flow run details

When our agent sees that there is a deployment scheduled to run in the work queue, it starts the flow run on our local K8s infrastructure.
The specified Prefect Docker image is pulled. You can see the image history in Docker Desktop.
The extra pip package s3fs we specified is downloaded and installed.
The K8s pod starts, the code runs, and the pod exits.
You can see the current pod status in your terminal with kubectl get pods.
And you can see all the details about your pod with kubectl describe pods <your pod name here>. My pod name was super-rabbit-97p8g-6wrvq .