Scheduling in Kubernetes, Part 2: Pod Affinity

Kynan Rilee
Koki
Published in
4 min readDec 21, 2017

The previous installment covered scheduling via Node Affinity. Node Affinity facilitates scheduling based on Node labels, which know nothing about the structure of your application. Another construct is needed to support application-aware scheduling.

Previously, we asked:

Where should I run this Pod?

Node Affinity narrowed this question down to:

Should I run my Pod on this Node?

This question is about a Node. The scheduler doesn’t factor in any outside information — just the Node itself.

What if “where should I run this pod” depends on where the rest of the application is running? The “rest of the application” is made of Pods. We want the question to be about these Pods:

Should I run my Pod in the same place as this other Pod?

Scheduling is about finding hardware to run your code. (photo source)

Pod Selector

The first step is to define what “other pod” we’re talking about. This part looks just like how we previously defined what Node we were talking about — selection based on labels.

For example, if we’re interested in Pods with the app label web-frontend: app=web-frontend

Topology

The next step is to define what “in the same place” means. In the diagram below, are the two Pods in the same place?

If we use the k8s.io/hostname label, “in the same place” means “on the same host”. Here, the Pods are in different “places”:

We can use any Node label for this notion of “place”. Another option is failure-domain.k8s.io/zone. The Pods are in the same zone:

Custom topologies can be encoded as user-defined Node labels. For example, you might label Nodes with the rack they belong to. Here’s how a generic custom_topology label creates groups of “co-located” Nodes:

node-4 and node-3 are co-located in foo. pod-a is in foo. pod-b is in bar.
pod-a and pod-b are both in bar
pod-a and pod-b are both in foo

Should I run my Pod in the same place as this other Pod?

The first and second steps gave us this:

pod: app=web-frontend
topology: kubernetes.io/hostname

i.e. Should I run my Pod in the same hostname as a web-frontend Pod?

The third step is whether the answer is Yes or No. Yes is called Affinity. No is called Anti-Affinity.

Yes

Here’s what Yes looks like: (it’s the same as above)

pod: app=web-frontend
topology: kubernetes.io/hostname

i.e. My Pod should run in the same hostname as a web-frontend Pod.

This rule is useful if you want to run your web-store (“My Pod”) on the same host as a web-frontend instance.

No

(note the anti_pod key)

anti_pod: app=web-frontend
topology: kubernetes.io/hostname

i.e. My Pod should not run in the same hostname as a web-frontend Pod.

This rule is useful if you want to make sure your web-frontend instances all run on different hosts.

Hard, Soft, Combining Rules

Just like Node Affinity rules, Pod Affinity rules come in hard and soft variations, and it’s possible to have any combination of hard and soft rules. The semantics are identical.

e.g. Prefer not to run in the same zone as a web Pod:

anti_pod: app=web:soft # note the ':soft'
topology: failure-domain.beta.kubernetes.io/zone

Here are two co-located Deployments that each spread their Pods across different Nodes:

deployment:
...
selector:
app: web-store
affinity:
- anti_pod: app=web-store # spread out store Pods
topology: kubernetes.io/hostname
---deployment:
...
selector:
app: web-frontend
affinity:
- anti_pod: app=web-frontend # spread out frontend Pods
topology: kubernetes.io/hostname
- pod: app=web-store:soft # co-locate with store Pods
topology: kubernetes.io/hostname

What’s Next?

Between Node Affinity and Pod Affinity, we’ve now covered the primary mechanisms for user-defined scheduling. For additional reference material, see the Kubernetes and Koki Short docs on affinities.

This isn’t the end, though. There’s some exciting recent work that adds even greater expressiveness to Kubernetes scheduling. More on that soon!

--

--