Scheduling in Kubernetes, Part 1: Node Affinity

Kynan Rilee
Koki
Published in
3 min readDec 19, 2017

In Kubernetes, a Pod is the basic unit of work, so scheduling workloads reduces to the question:

Where should I run this Pod?

The answer is a Node. But which one?

By default, the scheduler packs containers in a uniform fashion. (image source)

Here are some Node labels to help you choose:

# provided by Kubernetes:
k8s.io/hostname
failure-domain.beta.k8s.io/zone
failure-domain.beta.k8s.io/region
beta.k8s.io/instance-type
beta.k8s.io/os
beta.k8s.io/arch
# user-defined (cluster admin, cloud provider, etc):
rack
, disktype, ...

Let’s select a specific Node: k8s.io/hostname=host_named_foo

Or a specific region: failure-domain.beta.k8s.io/region=us-east-1

Either of two regions:
failure-domain.beta.k8s.io/region=us-east-1,us-east-2

And make sure the Node has an SSD:
disktype=ssd&failure-domain.beta.k8s.io/region=us-east-1,us-east-2

In a real-world example, let’s run our PostgreSQL servers in different zones:

pod:
name: postgres-primary
...
affinity:
- node: failure-domain.beta.k8s.io/zone=us-east-1a
---pod:
name: postgres-standby
...
affinity:
- node: failure-domain.beta.k8s.io/zone=us-east-1b

Selecting Nodes by their labels gives us the ability to talk about which Nodes are right for a Pod.

Hard and Soft

Sometimes, it’s ok to schedule a Pod on a Node that doesn’t have the labels you want. That’s why there are two different ways to use the selectors we defined earlier.

Hard Rule

This Pod needs an amd64 Node: beta.k8s.io/arch=amd64

If no amd64 Nodes are available, the Pod cannot run.

Soft Rule

This Pod prefers m4-large Nodes: beta.k8s.io/instance-type=m4-large:soft

Annotating the rule with “:soft” indicates that it is only a preference, not a requirement.

If an m4-large Node is available, the Pod will run there.
If no m4-large Nodes are available, the Pod can still be scheduled elsewhere.

Combining Multiple Scheduling Rules

The Node Affinity system allows a list of rules for each Pod. If there are any Hard rules, only one Hard rule must be satisfied. If there are any Soft rules, the scheduler will try to satisfy as many Soft rules as possible, scored by weight. Any combination of Hard and Soft rules is allowed.

For example, a Pod may be happiest with m4-large, but m4-medium is still preferable to other alternatives:

pod:
...
affinity:
- node: beta.k8s.io/instance-type=m4-large:soft:2
- node: beta.k8s.io/instance-type=m4-medium:soft:1

Note that both rules are Soft and that the preferred choice has a higher weight.

Multiple Hard rules looks like this:

pod:
...
affinity:
- node: failure-domain…/region=us-west-1&label_key_baz=foo
- node: failure-domain…/region=us-east-1&label_key_baz=bar

The Pod can only run on foo Nodes in us-west-1 or bar Nodes in us-east-1.

Proper Nomenclature

In the Kubernetes API, these concepts have their own special names.

A “Hard rule” is a NodeSelectorTerm in NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution, which means the rule is “required during scheduling” but has no effect on an already-running Pod.

A “Soft rule” is a PreferredSchedulingTerm in NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution, which means the rule is “preferred during scheduling” but likewise has no effect on an already-running Pod.

Together, these rules are called NodeAffinity because (in a figurative sense) they indicate a Pod’s “attraction” to certain Nodes.

For additional reference material, see the Kubernetes docs here.

What’s Next?

Node Affinity introduces scheduling rules based on Node metadata, but Node metadata doesn’t contain application-specific information. If we want application-aware scheduling like co-locating Pods from certain services (e.g. running backend and database instances on the same Node), we need another tool: Pod Affinity. Stay tuned for Part 2!

In the meantime, learn more about Node Affinity and the syntax I’ve used (or skip ahead to the Pod Affinity and Anti-Affinity sections): docs.koki.io/short/resources/pod/#node-affinity

--

--