Working With Node Affinity In Kubernetes

How Does Node affinity Work in Kubernetes?

@pramodchandrayan
SysopsMicro
9 min readOct 25, 2020

--

Welcome to this brand new piece of working with Kubernetes series,

In the previous piece:

we discussed,

How one can allow any particular pod with a certain type of workload to be specifically scheduled on the particular Node, using the simplicity and power of nodeSelector.

Let’s revisit the example we discussed in the last piece:

apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: staging
spec:
containers:
- name: nginx
image: nginx

nodeSelector:
size: large

Here we are trying to place our Pod, using nodeSelector key-value pair

size:large

In such a case, one thing which limits the node selector approach of placing the specific pod to a specific node is its ability to handle cases like

  • size: Large or Medium
  • Size: Not big
  • size: Not small, etc…

So, If someone wants to provision the pod in the node based on such cases, he will not be able to handle this scenario by just using the Node Selector. What one can do? Well the answer lies in the concept of

“Node Affinity”

What is Node Affinity?

Node affinity is objectively used to perform the same task as nodeSelector where it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node. But it differs in its ability to make this constraint become more expressive

The affinity language in node affinity offers the protocols or matching rule based on logical OR/ AND operation, NOT operations, etc..

Understanding Node Affinity & It’s Type By Example:

Let’s write the Pod definition

apiVersion: v1
kind: Pod
metadata:
name: node-affinity-demo
labels:
env: staging
spec:
containers:
- name: node-affinity-demo
image: nginx
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: size
operator: In
values:
- large
- small

In node selector, we used nodeSelector field under the Spec, but here in the pod definition, it is replaced by more complex terms as shown below

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: size
operator: In
values:
- large
- small

affinity is used to define nodeAffinity parameter which encompasses the complex sentence like the term

requiredDuringSchedulingIgnoredDuringExecution:

This sentence encompasses nodeSelectorTerms,

which in turn holds the field

- matchExpressions:
- key: size
operator: In
values:
- large
- small

it is the place where you manage the logical AND, OR, NOT expression to define more expressive constraints for the pod. If you carefully observe the

- matchExpressions

It has :

  • key: which defines the parameter like size
  • operator: here we place the operator: In, it can also have the values like NotIn, Exists. There are more operators which one need to refer to in the Kubernetes doc
  • value: which defines values for the keys like, large, medium, medium and small, medium or large, only medium, not small, etc..
  • It can hold multiple values like:
values:
- large
- small

In operator, ensures that the Pod is placed in the node based on matching key-value pairs. It can match based on multiple values.

Now that we have understood how we can create a pod definition making use of the Node affinity concept, it is imperative to understand:

What happens when there are no node labels matching the kind of expression which we defined above?

There can be a situation that as per the pod’s affinity definition there are no matching node labels, which can match the expression values like large, small, etc, in such a scenario how does the pod gets handled by Kubernetes.

There can also be situations where the pod is already scheduled in the Node, but someone changed the node label knowingly or accidentally, how does Kubernetes handle this kind of situations

Well to answer it better, we have to understand the types of node affinity

Node Affinity Type:

The types of Node affinity actually define the behavior of our scheduler with respect to Node affinity and based on the stage of our Pod’s lifecycle.

There are two major categories of Node Affinity

  1. Available:

It has two types of node affinity :

requiredDuringSchedulingIgnoredDuringExecution

and

preferredDuringSchedulingIgnoredDuringExecution

2. Planned:

requiredDuringSchedulingrequiredDuringExecution

Let’s understand each of these in detail, by breaking these terms up to make more sense.

Available:

- requiredDuringSchedulingIgnoredDuringExecution:

- preferredDuringSchedulingIgnoredDuringExecution:

Fig 1.0: Node Affinity Type Explanation

If we break our Available Node Affinity type as shown in the fig 1.0, we can clearly understand that pod has two important state’s in its lifecycle, w.r.t Node affinity

  • DuringScheduling
  • DuringExecution

DuringScheduing:

It is the state when the POD is being created for the first time

so if we have the situation where at the time of pod creation, there is no matching Node label, Kubernetes refers to the type parameter as shown in the table above to enforce decisions on pod’s scheduling.

required:

If the parameter is of the type required: In that case, the scheduler will mandate the POD to be scheduled based on the given Affinity rule, so if there are no matching label at the Node side, this pod will not be scheduled in the Node by our scheduler. That is why we can call this a “Hard” type of enforcement rule.

preferred

If the parameter is of the type preferred: In that case, the scheduler will try to enforce the but will not guarantee, so there are chances that POD may be allowed to schedule in the Node. That is why it also is known as a soft kind of rule, here we have given a choice to the scheduler that placement of pod is not as important as executing the workload.

DuringExecution:

This is a state where POD is already running in the given NODE.

In this state, if someone by mistake removed the node label, the already POD will be managed by scheduler based on the available Node affinity parameter

ignoredDuringExecution, as shown in fig1.0

which tells the scheduler to ignore this situation and keep running the POD in the node.

There is also another type of Node Affinity which is kind of Planned and that is :

Planned Type:

" requiredDuringSchedulingrequiredDuringExecution"

Here at the time of POD execution, one can also define it to be of Type requiredDuringExecution

instead of ignoredDuringExecution.

which tells the scheduler to evict the running POD, if the NODE in which is was scheduled has been modified and there is no matching label available.

Let’s Understand all that we discussed by the POD definition below:

node-affinity-demo.yaml

apiVersion: v1
kind: Pod
metadata:
name: node-affinity-demo
labels:
env: staging
spec:
containers:
- name: node-affinity-demo
image: nginx
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: size
operator: In
values:
- large
- small
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: Zone
operator: In
values:
- Zone 1
- Zone 2

Let’s understand :

preferredDuringSchedulingIgnoredDuringExecution

The weight field in preferredDuringSchedulingIgnoredDuringExecution has the value in the range 1-100. At the time of Node sorting, the scheduler gives higher weight to nodes having the labels zone=Zone1 or zone=Zone2. If a node has size=large and zone=Zone1, it is preferred over another that has size=large but has no zone label or one that points to a different zone.

The weight number gives the matching node a relatively higher weight than other nodes. The higher the number, the higher the weight.

Understanding the covered concepts through Example:

Case 1: There is a matching Node with matching Labels

Step1 : Label the Node

By default, your minikube cluster will have one default master node

let’s retrieve the node and describe it using the following command shown below, to see if there are any labels attached.

In my previous article on Node Selector, I have already created one node with a label which is shown in the highlighted area, in our output shown above, where minikube master node has the label “size=large”

But if you don’t have one, you can label any node by the following steps :

  • Run kubectl get nodes to get the names of your cluster's nodes. Pick out the one that you want to add a label to,
  • Run kubectl label nodes <node-name> <label-key>=<label-value> to add a label to the node you've chosen.

For example, if my node name is “minikube” (which is a default master node of my cluster), if you want to create a label using the following key-value pair: “size=large”

then you have to type the command as shown below

$ kubectl label nodes minikube size=large

Step 2: Let’s create a POD , using the definition file, which we have already discussed above when we created the node-affinity-demo.yaml, let me write it down once again for the convinience

node-affinity-demo.yaml

apiVersion: v1
kind: Pod
metadata:
name: node-affinity-demo
labels:
env: staging
spec:
containers:
- name: node-affinity-demo
image: nginx
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: size
operator: In
values:
- large
- small
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: Zone
operator: In
values:
- Zone 1
- Zone 2

now let’s create this pod file using the below CLI:

$kubectl apply -f node-affinity-demo.yaml

Let’s check if our pod has been scheduled based on the Node affinity parameter or not, by getting all pods with the below command

$kubectl get pods

output:

(base) Pramods-MacBook-Air:~ prammobibt$ kubectl get podsNAME READY STATUS RESTARTS AGEnginx 1/1 Running 0 42hnginx-6799fc88d8–9d75l 1/1 Running 1 10dnode-affinity-demo 1/1 Running 0 3m6sweb-app 1/1 Running 1 10dwebapp-7694cb7f68-crwj8 0/1 ImagePullBackOff 0 10dwebapp-7694cb7f68-kjnng 0/1 ImagePullBackOff 0 10dwebapp-7694cb7f68-r8wnq 0/1 ImagePullBackOff 0 10d(base) Pramods-MacBook-Air:~ prammobibt$

you can see the highlighted output string, “node-affinity-demo 1/1 Running 0 3m6s,” which shows that our pod “node-affinity-demo” has been scheduled successfully on our master node. This happened because the label on the minikube master node has the matching label “size=large ”,

Case2: Now suppose there is a label mismatch in the POd’s affinity definition, file,

node-affinity-test-2.yaml:

apiVersion: v1
kind: Pod
metadata:
name: node-affinity-demo-2
labels:
env: staging
spec:
containers:
- name: node-affinity-demo-2
image: nginx
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disk-size
operator: In
values:
- medium
- small

here we have created a pod definition file, where, matchExpressions has a key-value pair

“disk-size= medium or small”

- matchExpressions:
- key: disk-size
operator: In
values:
- medium
- small

as we have no, node label with this key-value pair, it would be interesting to investigate how this POD named : node-affinity-demo-2, will be treated by the schedular, let’s see

Create the POD : node-affinity-test-2.yaml:

$ kubectl apply -f node-affinity-test-2.yaml

output:

Fig 3.0

As can be seen in the output string highlighted with blue color in fig 3.0, our newly created pod

node-affinity-demo-2”, has a pending status and has not been scheduled, the reason is, that there are no matching labels in the minikube master node which can match the below-given expression defined in our pod file

- matchExpressions:
- key: disk-size
operator: In
values:
- medium
- small

So our POD has not been scheduled by the scheduler. This explains how our scheduler treats our POD based on the node affinity type. It would be interesting to explore

What happens when we will use, preferredDuringSchedulingIgnoredDuringExecution?

node-affinity-test-3.yaml

apiVersion: v1
kind: Pod
metadata:
name: node-affinity-demo-3
labels:
env: staging
spec:
containers:
- name: node-affinity-demo-3
image: nginx
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: size
operator: In
values:
- large
- medium

Let’s create this pod file, using the apply command as shown below:

$ kubectl apply -f node-affinity-test-3.yaml

Output:

It can be seen from the below-given output, which I have highlighted with blue color, our new pod node-affinity-demo-3 has been successfully placed in the master node, even though there was no matching label attached to our master node based on the node affinity matchExpressions

This goes on to show that if our Node affinity type is: “preferredDuringSchedulingIgnoredDuringExecution”, our scheduler will still place the pod in the Node, as here it will give preference to the execution of pod workload, irrespective of expression mismatch with Node label.

I Hope, these use cases have clarified how Node affinity functions.

What’s Next?

In the upcoming part of our Working with Kubernetes series, we will cover

  • Taints & Toleration Vs Node affinity,
  • We will revise the concepts and work on some practical examples

Thanks a lot……..For Reading …..

--

--

@pramodchandrayan
SysopsMicro

Building @krishaq: an Agritech startup committed to revive farming, farmers and our ecology | Writes often about agriculture, climate change & technology