How to use Kubernetes with Prefect
Part 1: Up and running
Kubernetes is a workhorse! š“ Itās a great way to declaratively scale containerized infrastructure up and down for long running processes. Kubernetes can also be a bear. š» It has a lot of moving parts that can be cumbersome to run in harmony.
Prefect is awesome! š It lets you coordinate your workflows ā running them on a schedule with automatic retries, caching, reusable configuration, a collaborative UI, and more. Prefect also provides observability into whatās happening across your data stack with automatic logging and notifications.
Prefectās built-in Kubernetes support makes it easier to run your Python dataflow code at scale. You get all the that orchestration and observability goodness with scalable infrastructure for long-running processes. š
If you arenāt familiar with Kubernetes (AKA K8s), check out my introductory posts on K8s concepts and commands.
If you arenāt familiar with Prefect v2, I suggest doing the Prefect tutorials and checking out the concept docs for flows, tasks, blocks, deployments, and code storage.
Why would you use K8s with Prefect?
If youāre already using Kubernetes to run your Python data engineering, ML training, or other backend workflows, you can benefit from all the observability and orchestration that Prefect provides.
If youāre already using Prefect and looking to scale up your code infrastructure with lots of customizability, Kubernetes is a great choice. For example, if want to run compute-heavy ML training processes, you might want to use K8s.
Using infrastructure
Prefect makes it easy for you to run your dataflow code in a variety of infrastructure. By default, your code runs in a local subprocess on your machine. Alternatively, you can create a deployment and specify the infrastructure for your code to run in. Prefect provides pre-built infrastructure blocks for integrating with Docker, Kubernetes, AWS ECS, GCP Cloud Run, and Azure Container Instances.
As weāll see in future articles, there are several ways you can use Kubernetes with Prefect. In this post weāll get our feet wet with our agent and K8s infrastructure running locally, our Python flow code in AWS S3 remote storage, and Prefect Cloud as our orchestration engine.
Letās do this! š
Use the default Kubernetes Job infrastructure
Weāll start off with a basic setup and iterate in future posts.
Setup
Install Docker Desktop and enable K8s
Letās run Kubernetes locally. If you donāt have K8s installed, I suggest using the version that ships with Docker Desktop.
If needed, download Docker Desktop for your operating system and fire it up. Then, enable Kubernetes. The Kubernetes menu is found by clicking on the gear icon in the top right of Docker Desktop.
Download and install Prefect
In your Python virtual environment, install the latest version of Prefect with pip install -U prefect
or use the version in this post with pip install prefect==2.6.8
.
Set up Prefect Cloud
Sign up for a free Prefect Cloud account if you donāt have one yet.
If you havenāt connected your machine to Prefect Cloud before, make an API key by clicking on your account icon and creating a key.
Copy the command line code snippet that appears when you create your key. Run the snippet in your terminal to save your key to your local Prefect profile.
Create flow code
We just want to demonstrate that things are working as expected. Letās use some basic code that logs information about the network and instance. š
from prefect import flow, get_run_logger
from platform import node, platform
@flow
def check():
logger = get_run_logger()
logger.info(f"Network: {node()}. ā
")
logger.info(f"Instance: {platform()}. ā
")
if __name__ == "__main__":
check()
Weāll upload this code to our S3 bucket.
Create AWS S3 bucket
Create an AWS S3 bucket with the default settings.
Set IAM User
For this tutorial, you could leave your bucket unsecured, but thatās not a great practice. Instead, letās use the AWS credentials of an IAM user with access to interact with S3.
If you donāt have one yet, create a user with S3 read and write permissions in the AWS console.
Make an access key for the new user. Then copy the Access key ID and Secret access key ā youāll use them when you create your Prefect S3 Block.
Create S3 remote storage block
You can create a Prefect block from the UI or Python code. Hereās how you can create an S3 block from the UI.
From the Blocks menu, click on the + button and select the S3 block.
Give the block a unique name and your bucket path. Then input your Access key ID and Secret access key that you created earlier.
Create your Deployment
Next weāll build and apply our deployment from the command line. Alternatively, we could have defined our deployment in a Python file, as I showed here.
prefect deployment build flows.py:check -n k8sjob -sb s3/myawsblock
-i kubernetes-job --override env.EXTRA_PIP_PACKAGES=s3fs -a
Letās break this down. ā¬ļø
The Python flow code is found in the flows.py file. The entrypoint flow function in that file is check.
We named our deployment k8sjob.
We specified the S3 storage block we created above named prefect-k8s. All the files in the current local directory get uploaded to our S3 bucket.
The flow will run in the default kubernetes-job infrastructure, with the most recent Docker image, and a bunch of basic Kubernetes presets. Weāll see those presets in a minute.
Note that to use S3 with K8s, you need to use the override flag and pass the environment variable EXTRA_PIP_PACKAGES with s3fs. Otherwise, the container will not have the Python package it needs to grab your remote storage flow code from S3.
The -a
tells Prefect to send the deployment info to the server. š
Results
Letās see the results of creating our deployment. In the UI, click on Deployments and then click on check/k8sjob. Then click on the link to the anonymous infrastructure block that looks like this:
Youāll see the details of the default K8s manifest.
Weāll look at how to adjust these fields in the next post. Follow me to make sure you donāt miss it! š
A Kubernetes deployment will be created in the default name space. Note that a Kubernetes deployment and a Prefect deployment are different things. ā ļø
Start your agent
Your agent will run locally, on your machine, polling your Prefect Cloud default work queue. Hereās the command to fire it up:
prefect agent start -q 'default'
All the pieces are in place. Letās run it! š„
Schedule a flow run
Weāll create an ad hoc flow run. We could use the UI or the CLI. Letās use the CLI. Open another terminal window and run the following command:
prefect deployment run check/k8sjob
In the terminal running your agent you should see output that looks like this:
The Prefect adjective-animal name given to this flow run is super-rabbit. š
It will take a little time for K8s to work its magic, and then the output should wrap up like this:
Woo hoo, we did it! š
Letās look more closely at what happened.
Flow run details
- When our agent sees that there is a deployment scheduled to run in the work queue, it starts the flow run on our local K8s infrastructure.
- The specified Prefect Docker image is pulled. You can see the image history in Docker Desktop.
- The extra pip package s3fs we specified is downloaded and installed.
- The K8s pod starts, the code runs, and the pod exits.
- You can see the current pod status in your terminal with
kubectl get pods
. - And you can see all the details about your pod with
kubectl describe pods <your pod name here>
. My pod name was super-rabbit-97p8g-6wrvq .
- Logs are available in the Prefect CLI, as you saw earlier. Or you can check them out in the UI.
- Just click on the flow run adjective-nature-noun to see the logs in the CLI ā thatās super-rabbit in my case.
And thatās it! You just used Prefect and Kubernetes to run a flow with your code stored on S3. š
Wrap
Youāve seen how to run your Prefect flows with Kubernetes without creating a custom infrastructure block.
I hope you found this guide useful. If you did, please share it on your favorite social media so other people can find it, too. š
In the next article in this series, weāll see how to customize our K8s setup. š
Got questions? Reach out on Prefectās 20,000+ member Community Slack.
Happy engineering! āļø