Pod Disruption Budget: The Practical Guide

Arie Bregman
6 min readMar 19, 2023

--

In my previous article on Pod Disruption Budget, we mostly explored PDB theoretically. Now it’s time to get our hands dirty and see not only how to create PDB but also put it to a test and see if it actually protects our Pods from disruptions as expected.

A completely random image by Arie Bregman

What are the requirements for being able to use “Pod Disruption Budget”?

Let’s start from the very beginning — what exactly do you need in order to use PDB? well the list is quite short

  • Kubernetes ≥ 1.21
  • To create and apply PDB you have to specify on which Pods to apply it. Make sure you label your Pods accordingly so it’s easy to mark those that PDB should be applied on

How to create PDB?

We’ll mention some of the ways that exist to create a PDB object.

Kubectl Create

To immediately apply PDB on a certain workload, run the following kubectl command

kubectl create poddisruptionbudget app-pdb --min-available=1 \
--selector=app=super-critical-app

Let’s break it down:

  • poddisruptionbudget is the Kubernetes API resource type we would like to create which is the “Pod Disruption Budget” resource. You could also use the short namepdb
  • app-pdb is the name of the PDB resource we created for the app super-critical-app
  • --min-available=1 makes sure 1 replica of our app is always available
  • --selector=app=super-critical-app is how we target which Pods should have the PDB applied on

Another way to apply the very same thing, assuming you have 1 replica, is to use max-unavailable this way

kubectl create poddisruptionbudget app-pdb --max-unavailable=0 \
--selector=app=super-critical-app

Why is it the same? Because if you have 1 replica and it can’t be unavailable, then you allow by max to have 0 replicas unavailable

YAML definition

Another way of course to create PDB objects is using YAML files describing the configuration of the PDB. Let’s see an example of the very same PDB from the previous section, using minAvailable

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: super-critical-app

Running kubectl apply -f <YAML_FILE> will create the very same PDB. Let’s see also an example of using maxUnavailable

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
spec:
maxUnavailable: 0
selector:
matchLabels:
app: super-critical-app

Helm Charts

If you are using Helm charts, you should know some charts support creating PDB built-in by modifying the chart’s values. For example both mongodb and rabbitmq charts support the following values to create PDB for their workloads

pdb:
create: true
minAvailable: 1
maxUnavailable: 0

When deploying the charts with the values above, it will create the very same PDB we saw earlier, but applied specifically on the app deployed by the chart.

Verify PDB created and applied

Let’s start with listing our PDB objects. There should be one, the one created in the previous section called app-pdb

$ kubectl get pdb
NAME        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
app-pdb 1 0 0

Great, we have PDB in our cluster and it is set to make sure our application has a minimum of 1 replica. Since we run 1 replica by default, only 1 replica can be unavailable (or else our application will stop running).

Question is, how do we know it actually applied on our Application Pod? to answer that, let’s first check our PDB definition

$ kubectl describe pdb app-pdb

Name: app-pdb
Namespace: my-namespace
Min available: 1
Selector: app=super-criticalp-app
Status:
Allowed disruptions: 0
Current: 1
Desired: 1
Total: 1

We can see our PDB is applied on Pods that have the label app=super-critical-app . Let’s see if we can actually find a Pod with this label

$ kubectl get pods -l  "app=super-critical-app"

NAME READY STATUS RESTARTS AGE
super-critical-app 1/1 Running 0 17h

Great! we see that our PDB actually applies to an existing running Pod. Now let’s proceed to prove that PDB actually does what it is supposed to do.

Test out PDB

To fully understand PDB’s power, the best thing you can do is to actually put it to the test with different scenarios where it is supposed to “protect” the app from going under a certain number of replicas.

Node Drain

Let’s start with draining a node and not just any node, but the node on which our app replicas are running. A node drain will evacuate all the Pods from a certain node after marking it as “cordoned” which means no new Pods can be scheduled on that node.

Let’s say we executed kubectl get po -o wide | grep -i super-critical-app and we found out the node name is some-node-name (yes, original name, I know). Let’s drain that node.

$ kubectl drain some-node-name --ignore-daemonsets
node/some-node cordoned

That’s a good start. First, we see our node is cordoned which means no new workloads will be scheduled there. Let’s see how the output continues

evicting pod default/some-app
evicting pod default/super-critical-app
evicting pod default/meh-app

Interesting. all the Pods on the node are going to be evicted. But don’t be mistaken, this doesn’t mean PDB doesn’t work. That’s just a message on what it plans to do. Let’s see what comes next

evicting pod default/super-critical-app
error when evicting pods/"super-critical-app" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/telegraf-bb85f9d4-pkhcp
error when evicting pods/"super-critical-app" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

PDB to the rescue! Kubernetes is unable to evict the Pod (the one our newly created PDB targets) from the node because it evaluates the defined PDB and reaches the conclusion that if it evacuates it, the available number of Pods won’t be 1 anymore but 0 instead (and that lower than minAvailable=1)

Kubernetes Node Pool Upgrade

Let’s test PDB as part of another workflow — upgrading a node pool of a GKE cluster in GCP (Google Cloud Platform) with one node and minAvailable=1 . The way such a process usually works is by marking nodes with cordoned (so no new workloads scheduled on them) and then applying drain to move workloads to the new node with the updated Kubernetes version. In theory, PDB should prevent it, as it means the number of replicas will reduce to 0 when evicting the Pods from one node to another. Let’s test it out.

First, upgrade the Kubernetes version. This is can be done easily through the web UI (GKE -> Node Pool -> Edit -> Change under “Node Version”) or using the gcloud CLI gcloud container clusters upgrade CLUSTER_NAME --node-pool=NODE_POOL_NAME --cluster-version VERSION .

And the result is…no upgrade! well, not really. Yes, at first your workload will not move to the new node and basically will be the only thing remaining on the old node (assuming you don’t have PDB on other workloads). But see this interesting message you get from GCP

When going to documentation to learn more, you’ll notice the following sentence “During automatic or manual node upgrades, PodDisruptionBudgets (PDBs) and Pod termination grace period are respected for a maximum of 1 hour.” In other words, it’s fine you used PDB to prevent disruptions, but after one hour we are going to pretend it’s not there. An interesting case in my opinion, because it shows some exceptions always exist and so you shouldn’t assume any solution is completely bulletproof.

More scenarios to test out!

Want to keep playing with PDB? Up to the challenge of running more complex scenarios? good, here are a couple of proposals (hint: PDB won’t work for every scenario listed here!)

  • Node Memory Pressure
  • Kubectl Pod Delete
  • Complete Node (or Node Pool) Removal

Summary

While in the previous article we mostly deep-dived into the theoretical part of PDB, understanding what is it exactly and when should it be (or not) used, in this article, we got our hands dirty and to be more specific, we did not only demonstrated how to create it but also we’ve put it to the test to see how it plays in different scenarios. Overall, PDB is a powerful concept and can help you tremendously in making your Kubernetes cluster more resilient than ever.

--

--