How to use Kubernetes with Prefect

Part 1: Up and running

Jeff Hale
The Prefect Blog
8 min readNov 21, 2022

--

wooden captainā€™s wheel from a ship

Kubernetes is a workhorse! šŸ“ Itā€™s a great way to declaratively scale containerized infrastructure up and down for long running processes. Kubernetes can also be a bear. šŸ» It has a lot of moving parts that can be cumbersome to run in harmony.

Prefect logo

Prefect is awesome! šŸš€ It lets you coordinate your workflows ā€” running them on a schedule with automatic retries, caching, reusable configuration, a collaborative UI, and more. Prefect also provides observability into whatā€™s happening across your data stack with automatic logging and notifications.

Prefectā€™s built-in Kubernetes support makes it easier to run your Python dataflow code at scale. You get all the that orchestration and observability goodness with scalable infrastructure for long-running processes. šŸ™‚

If you arenā€™t familiar with Kubernetes (AKA K8s), check out my introductory posts on K8s concepts and commands.

If you arenā€™t familiar with Prefect v2, I suggest doing the Prefect tutorials and checking out the concept docs for flows, tasks, blocks, deployments, and code storage.

Why would you use K8s with Prefect?

If youā€™re already using Kubernetes to run your Python data engineering, ML training, or other backend workflows, you can benefit from all the observability and orchestration that Prefect provides.

If youā€™re already using Prefect and looking to scale up your code infrastructure with lots of customizability, Kubernetes is a great choice. For example, if want to run compute-heavy ML training processes, you might want to use K8s.

Using infrastructure

Prefect makes it easy for you to run your dataflow code in a variety of infrastructure. By default, your code runs in a local subprocess on your machine. Alternatively, you can create a deployment and specify the infrastructure for your code to run in. Prefect provides pre-built infrastructure blocks for integrating with Docker, Kubernetes, AWS ECS, GCP Cloud Run, and Azure Container Instances.

Diagram showing how Prefect deployments with flow runs work

As weā€™ll see in future articles, there are several ways you can use Kubernetes with Prefect. In this post weā€™ll get our feet wet with our agent and K8s infrastructure running locally, our Python flow code in AWS S3 remote storage, and Prefect Cloud as our orchestration engine.

Letā€™s do this! šŸš€

Use the default Kubernetes Job infrastructure

Weā€™ll start off with a basic setup and iterate in future posts.

Setup

Install Docker Desktop and enable K8s

Letā€™s run Kubernetes locally. If you donā€™t have K8s installed, I suggest using the version that ships with Docker Desktop.

If needed, download Docker Desktop for your operating system and fire it up. Then, enable Kubernetes. The Kubernetes menu is found by clicking on the gear icon in the top right of Docker Desktop.

docker desktop menu to enable kubernetes

Download and install Prefect

In your Python virtual environment, install the latest version of Prefect with pip install -U prefect or use the version in this post with pip install prefect==2.6.8.

Set up Prefect Cloud

Sign up for a free Prefect Cloud account if you donā€™t have one yet.

If you havenā€™t connected your machine to Prefect Cloud before, make an API key by clicking on your account icon and creating a key.

form to create Prefect api key

Copy the command line code snippet that appears when you create your key. Run the snippet in your terminal to save your key to your local Prefect profile.

Create flow code

We just want to demonstrate that things are working as expected. Letā€™s use some basic code that logs information about the network and instance. šŸ™‚

from prefect import flow, get_run_logger
from platform import node, platform

@flow
def check():
logger = get_run_logger()
logger.info(f"Network: {node()}. āœ…")
logger.info(f"Instance: {platform()}. āœ…")

if __name__ == "__main__":
check()

Weā€™ll upload this code to our S3 bucket.

Create AWS S3 bucket

Create an AWS S3 bucket with the default settings.

s3 bucket creation page

Set IAM User

For this tutorial, you could leave your bucket unsecured, but thatā€™s not a great practice. Instead, letā€™s use the AWS credentials of an IAM user with access to interact with S3.

If you donā€™t have one yet, create a user with S3 read and write permissions in the AWS console.

Create IAM user

Make an access key for the new user. Then copy the Access key ID and Secret access key ā€” youā€™ll use them when you create your Prefect S3 Block.

Create access key

Create S3 remote storage block

You can create a Prefect block from the UI or Python code. Hereā€™s how you can create an S3 block from the UI.

From the Blocks menu, click on the + button and select the S3 block.

Create s3 block form in Prefect

Give the block a unique name and your bucket path. Then input your Access key ID and Secret access key that you created earlier.

Prefect s3 remote storage block form

Create your Deployment

Next weā€™ll build and apply our deployment from the command line. Alternatively, we could have defined our deployment in a Python file, as I showed here.

prefect deployment build flows.py:check -n k8sjob -sb s3/myawsblock  
-i kubernetes-job --override env.EXTRA_PIP_PACKAGES=s3fs -a

Letā€™s break this down. ā¬‡ļø

The Python flow code is found in the flows.py file. The entrypoint flow function in that file is check.

We named our deployment k8sjob.

We specified the S3 storage block we created above named prefect-k8s. All the files in the current local directory get uploaded to our S3 bucket.

The flow will run in the default kubernetes-job infrastructure, with the most recent Docker image, and a bunch of basic Kubernetes presets. Weā€™ll see those presets in a minute.

Note that to use S3 with K8s, you need to use the override flag and pass the environment variable EXTRA_PIP_PACKAGES with s3fs. Otherwise, the container will not have the Python package it needs to grab your remote storage flow code from S3.

The -a tells Prefect to send the deployment info to the server. šŸš€

Results

Letā€™s see the results of creating our deployment. In the UI, click on Deployments and then click on check/k8sjob. Then click on the link to the anonymous infrastructure block that looks like this:

anonymous insfrastructure block example

Youā€™ll see the details of the default K8s manifest.

Weā€™ll look at how to adjust these fields in the next post. Follow me to make sure you donā€™t miss it! šŸš€

A Kubernetes deployment will be created in the default name space. Note that a Kubernetes deployment and a Prefect deployment are different things. āš ļø

Start your agent

Your agent will run locally, on your machine, polling your Prefect Cloud default work queue. Hereā€™s the command to fire it up:

prefect agent start -q 'default'

agent output screenshot

All the pieces are in place. Letā€™s run it! šŸ”„

Schedule a flow run

Weā€™ll create an ad hoc flow run. We could use the UI or the CLI. Letā€™s use the CLI. Open another terminal window and run the following command:

prefect deployment run check/k8sjob

In the terminal running your agent you should see output that looks like this:

The Prefect adjective-animal name given to this flow run is super-rabbit. šŸ‡

It will take a little time for K8s to work its magic, and then the output should wrap up like this:

Woo hoo, we did it! šŸŽ‰

Letā€™s look more closely at what happened.

Flow run details

  • When our agent sees that there is a deployment scheduled to run in the work queue, it starts the flow run on our local K8s infrastructure.
  • The specified Prefect Docker image is pulled. You can see the image history in Docker Desktop.
  • The extra pip package s3fs we specified is downloaded and installed.
  • The K8s pod starts, the code runs, and the pod exits.
  • You can see the current pod status in your terminal with kubectl get pods.
  • And you can see all the details about your pod with kubectl describe pods <your pod name here>. My pod name was super-rabbit-97p8g-6wrvq .
  • Logs are available in the Prefect CLI, as you saw earlier. Or you can check them out in the UI.
deployment in the Prefect UI
  • Just click on the flow run adjective-nature-noun to see the logs in the CLI ā€” thatā€™s super-rabbit in my case.

And thatā€™s it! You just used Prefect and Kubernetes to run a flow with your code stored on S3. šŸš€

Wrap

Youā€™ve seen how to run your Prefect flows with Kubernetes without creating a custom infrastructure block.

I hope you found this guide useful. If you did, please share it on your favorite social media so other people can find it, too. šŸš€

In the next article in this series, weā€™ll see how to customize our K8s setup. šŸ‘

Got questions? Reach out on Prefectā€™s 20,000+ member Community Slack.

row boat in mountain lake

Happy engineering! āš’ļø

--

--

Jeff Hale
The Prefect Blog

I write about data things. Follow me on Medium and join my Data Awesome mailing list to stay on top of the latest data tools and tips: https://dataawesome.com