Scheduling in Kubernetes, Part 2: Pod Affinity
The previous installment covered scheduling via Node Affinity. Node Affinity facilitates scheduling based on Node labels, which know nothing about the structure of your application. Another construct is needed to support application-aware scheduling.
Previously, we asked:
Where should I run this Pod?
Node Affinity narrowed this question down to:
Should I run my Pod on this Node?
This question is about a Node. The scheduler doesn’t factor in any outside information — just the Node itself.
What if “where should I run this pod” depends on where the rest of the application is running? The “rest of the application” is made of Pods. We want the question to be about these Pods:
Should I run my Pod in the same place as this other Pod?
Pod Selector
The first step is to define what “other pod” we’re talking about. This part looks just like how we previously defined what Node we were talking about — selection based on labels.
For example, if we’re interested in Pods with the app
label web-frontend
: app=web-frontend
Topology
The next step is to define what “in the same place” means. In the diagram below, are the two Pods in the same place?
If we use the k8s.io/hostname
label, “in the same place” means “on the same host”. Here, the Pods are in different “places”:
We can use any Node label for this notion of “place”. Another option is failure-domain.k8s.io/zone
. The Pods are in the same zone:
Custom topologies can be encoded as user-defined Node labels. For example, you might label Nodes with the rack
they belong to. Here’s how a generic custom_topology
label creates groups of “co-located” Nodes:
Should I run my Pod in the same place as this other Pod?
The first and second steps gave us this:
pod: app=web-frontend
topology: kubernetes.io/hostname
i.e. Should I run my Pod in the same hostname
as a web-frontend
Pod?
The third step is whether the answer is Yes or No. Yes is called Affinity. No is called Anti-Affinity.
Yes
Here’s what Yes looks like: (it’s the same as above)
pod: app=web-frontend
topology: kubernetes.io/hostname
i.e. My Pod should
run in the same hostname
as a web-frontend
Pod.
This rule is useful if you want to run your web-store
(“My Pod”) on the same host as a web-frontend
instance.
No
(note the anti_pod
key)
anti_pod: app=web-frontend
topology: kubernetes.io/hostname
i.e. My Pod should not
run in the same hostname
as a web-frontend
Pod.
This rule is useful if you want to make sure your web-frontend
instances all run on different hosts.
Hard, Soft, Combining Rules
Just like Node Affinity rules, Pod Affinity rules come in hard and soft variations, and it’s possible to have any combination of hard and soft rules. The semantics are identical.
e.g. Prefer not to run in the same zone
as a web
Pod:
anti_pod: app=web:soft # note the ':soft'
topology: failure-domain.beta.kubernetes.io/zone
Here are two co-located Deployments that each spread their Pods across different Nodes:
deployment:
...
selector:
app: web-store
affinity:
- anti_pod: app=web-store # spread out store Pods
topology: kubernetes.io/hostname---deployment:
...
selector:
app: web-frontend
affinity:
- anti_pod: app=web-frontend # spread out frontend Pods
topology: kubernetes.io/hostname
- pod: app=web-store:soft # co-locate with store Pods
topology: kubernetes.io/hostname
What’s Next?
Between Node Affinity and Pod Affinity, we’ve now covered the primary mechanisms for user-defined scheduling. For additional reference material, see the Kubernetes and Koki Short docs on affinities.
This isn’t the end, though. There’s some exciting recent work that adds even greater expressiveness to Kubernetes scheduling. More on that soon!