How to use Kubernetes with Prefect: Part 2

Creating custom Kubernetes Job blocks

Jeff Hale
The Prefect Blog
8 min readNov 30, 2022

--

In the first post in this series, we used Prefect with Kubernetes to run our Python flow code without creating a custom Kubernetes Job block. In this post, we’ll show three more ways to use Prefect with K8s.

boats in Amsterdam
Source: pixabay.com

Create a Kubernetes Job block

Recall that our deployment in the previous article automatically created an anonymous Kubernetes Job block — we didn’t have to create a block explicitly. When we created our deployment we specified we wanted to use kubernetes-job for infrastructure and used an override flag to specify which additional Python package to install.

In this post, we’ll be making Kubernetes Job blocks to tailor our K8s settings in three different ways. In each example, we’ll create a Kubernetes Job block prior to building our deployment.

We’ll keep the rest of our setup the same as last time: a local single-node Kubernetes cluster, flow code on S3, and Prefect Cloud. Then, when we build our deployment, we’ll specify that our block should be used to create the infrastructure for our flow code to run in.

Let’s get cracking! 🚀

Example 1: K8s block with EXTRA_PIP_PACKAGES and a few tweaks

We can create our blocks via Python code or through the UI. Let’s use the UI for our first example. In Prefect Cloud, go to Blocks, hit the + button, and select the Kubernetes Job block.

Check out all knobs that we can turn! 🎛

blank block creation form for prefect k8s

The Hitchhiker’s Guide to the Galaxy is popular around Prefect, to put it mildly. Marvin is the lovable, misanthropic robot in the series, so let’s name our block marvin1.🤖

Enter the EXTRA_PIP_PACKAGES environment variable pair that we passed to prefect deployment build as an override in our previous example. Note that we could add more packages by separating their names with a space inside the parentheses.

Let’s make a few other changes to the default form values:

  • Set Image to prefecthq/prefect:2.6.8-python3.10. If you don’t specify a Docker image tag, Prefect creates a block with an image that matches the Python and Prefect versions inferred from your environment. As the Prefect docs state “It’s a good practice to use Docker images with specific Prefect versions in production.” So here we show how to specify the image version. See more discussion of Prefect’s Docker images in the docs.
  • Set KubernetesImagePullPolicy to Always. This policy ensures you get the freshest version of the specified image, so long as the image repository is available. Docker’s layer caching is smart, so this choice shouldn’t be a performance drag if your image is designed with caching in mind.
  • Change Job Watch Timeout Seconds to 6000. Changes the limit to 100 minutes.
  • Change Pod Watch Timeout Seconds to 6000. Changes the limit to 100 minutes.

Click Create and you’ve got a Kubernetes Job infrastructure block! 🧱

Create deployment

Next, let’s build and apply our deployment from the command line. We’ll use the same flow file and s3 remote storage block that we created in the previous post.

prefect deployment build flows.py:check -n k8sjob-block -sb s3/myawsblock -ib kubernetes-job/marvin1 -a

We named this deployment k8sjob-block. Notice how we use the -ib flag this time with the block slug in place of the -i and the --override flags that we used in the previous article.

Let’s make our agent poll from the default work queue with prefect agent start -q default.

Create a run from the UI or from the CLI with prefect deployment run check/k8sjob-block.

The flow run should look nearly identical to the one in our previous article — the same things are happening. Let’s check our results in the UI.

screenshot of completed flow run

LGTM! Let’s explore a second way to set up K8s to run our flow code.

Example 2: Custom Docker Image

Using an environment variable to specify packages means that we download and install the s3fs package into the container every time we run our deployment. That’s a bit inefficient. ⏳

Let’s bake the package into a custom image to save time when we run our flows. 🎉

Here are the steps:

  1. Create the Dockerfile and requirements.txt files.
  2. Build the Docker image.
  3. Push the Docker image to your Docker registry.
  4. Create a K8s block and specify your image in the Image field.
  5. Build and apply your deployment. Note that we need to rebuild our deployment if we change our infrastructure block. ⚠️
  6. Start your agent.
  7. Run your deployment.
  8. Profit. 🤑

Create Dockerfile and requirements.txt

Here’s our Dockerfile:

FROM prefecthq/prefect:2.6.8-python3.10
COPY requirements.txt .
RUN pip install -r requirements.txt --trusted-host pypi.python.org --no-cache-dir

We’ll start with a Prefect image for our base. Prefect images use use the official Python slim images as their bases, so most of the heavy lifting is done for us. 🏋️

Our requirements.txt file in the same directory as our Dockerfile has one line: s3fs.

If you want more customization for your Docker image, go to town. 😎

Build image

Let’s build our image with the following command:

docker image build . -t discdiver/prefect:2.6.8-python-3:10-s3fs

Note that discdiver is my Docker Hub username and my image tag is descriptive, but not required to conform to any special pattern by Prefect. If you want to learn about Docker commands, check out this article in my Docker series.

Push image

Let’s push our image to Docker Hub with this command:

docker image push discdiver/prefect:2.6.8-python-3.10-s3fs

Substitute your username and note that you’ll get an error if you aren’t logged in to Docker Hub. ⚠️

Create K8s block

Now it’s time to create our Kubernetes Job block and specify our recently created image. Let’s name this block marvin2.

block form for creating the block that shows custome image

If you make a mistake when creating your block, like I did, you can click on the three dot menu in the upper right of the page to edit the block. 😉

Build deployment

Build and apply the deployment with our new block.

prefect deployment build flows.py:check -n k8sjob-block-custom-image -sb s3/myawsblock -ib kubernetes-job/marvin2 -a

I named my deployment k8sjob-block-custom-image.

Start agent

If you shut down your agent, start it again with prefect agent start -q default.

Run deployment

Run the deployment with prefect deployment run check/k8sjob-block-custom-image.

terminal output screenshot showing flow ran

By shifting our package installs to the image build step we are saving time on each flow run. 🔥

Example 3: Custom K8s manifest

Want to tweak something in the Kubeconfig YAML file? You could write one from scratch, but Prefect gives you a convenience command you can use as a starting point. 🙌

Create K8s Base Job Manifest

In your CLI, run prefect kubernetes manifest flow-run-job. The output should look like this:

apiVersion: batch/v1
kind: Job
metadata:
# labels are required, even if empty
labels: {}
spec:
template:
spec:
completions: 1
containers: # the first container is required
- env: [] # env is required, even if empty
name: prefect-job
parallelism: 1
restartPolicy: Never

Let’s save this code to the file k8s_flow_run_job_manifest.yaml with the following command:

prefect kubernetes manifest flow-run-job > k8s_flow_run_job_manifest.yaml

Create K8s Job Bock

Let’s create a block using Python code this time, because copying all this into the UI with the correct formatting is a bit cumbersome. Here’s the code to create a block in my create-k8s-flow-block.py file:

from prefect.infrastructure import KubernetesJob

k8s_custom_block = KubernetesJob(
image="prefecthq/prefect:2-python3.10",
env={"EXTRA_PIP_PACKAGES": "s3fs"},
job=KubernetesJob.job_from_file("./k8s_flow_run_job_manifest.yaml"),
image_pull_policy="Always",
job_watch_timeout_seconds=6000,
pod_watch_timeout_seconds=6000,
)

if __name__ == "__main__":
k8s_custom_block.save("marvin3", overwrite=True)

Note that we use the job_from_file method to copy the base manifest into the block. Also note that we added our earlier tweaks to our Kubernetes Job block.

Run the code python create-k8s-flow-block.py and see the marvin3 block in the UI. Here’s the Base Job Manifest section:

k8s base job manifest prefect block section

Build deployment

Specify a new Deployment name and your new infrastructure block when you build your deployment.

prefect deployment build long_flow.py:check -n custom-base -sb s3/myawsblock -ib kubernetes-job/marvin3 -a

Run deployment

Then, with your agent up, kick off a run. You should see similar output to our first example.

output from flow run in terminal screenshot

Recap

In this article, you’ve seen three ways to customize Prefect Kubernetes Job blocks for running flows. In each example, we used Prefect Cloud and ran our agent and our K8s cluster infrastructure locally.

In the first example, we created a block with a few fields adjusted. You saw how to add Python packages to the block.

In the second example, we created a custom Docker image with our Python packages baked into it. Then, we specified this image in our K8s block. This method saves us time when we run our flows.

In the third example, we made a custom Base Job Manifest using YAML code created by a Prefect convenience command. This method provides flexibility for any K8s fields not otherwise provided by the block.

You could combine the second and third examples — using a custom Docker image and a custom Base Job Manifest — for maximum flexibility! 🤸

Wrap

I hope you found this guide to customizing Kubernetes Job blocks for flow runs with Prefect useful. If you did, please share it on your favorite social media so other people can find it, too. 🚀

Got questions? Reach out on Prefect’s 20,000+ member Community Slack.

There’s lots more you can do with Kubernetes and Prefect. I’ve got more articles in the works, so follow me to make sure you don’t miss them! 🙂

brightly colored houses on water with small boats in front
Source: pixabay.com

Happy engineering!

--

--

Jeff Hale
The Prefect Blog

I write about data things. Follow me on Medium and join my Data Awesome mailing list to stay on top of the latest data tools and tips: https://dataawesome.com