Understanding Taints, Tolerations, and Node Affinity in Azure Kubernetes Services (AKS)

huzefa qubbawala
3 min readMar 24, 2023

--

Are you running workloads on Azure Kubernetes Service (AKS)? If so, you may be familiar with the concepts of taints, tolerations, and node affinity. These concepts are essential to understand if you want to optimize workload scheduling and improve the performance of your AKS clusters.

Taints, Tolerations, and Node Affinity: What are They?

Taints are labels that you can apply to nodes in your AKS cluster. These labels indicate a node has specific characteristics, such as resource limitations or security requirements. Taints prevent Kubernetes from scheduling pods on a node that does not have a corresponding toleration. Tolerations are settings that you can apply to pods to allow them to be scheduled on nodes with specific taints. Using taints and tolerations, you can ensure that workloads are only scheduled on nodes meeting specific criteria.

Node affinity, on the other hand, is a way to influence scheduling decisions based on the labels that you apply to your nodes and pods. With node affinity, you can specify which nodes your workloads should be scheduled based on characteristics such as geographic location, hardware capabilities, or other custom labels.

Why Use Taints, Tolerations, and Node Affinity in AKS?

Taints, tolerations, and node affinity are powerful tools that can help you optimize your AKS cluster’s performance and improve the reliability of your workloads. By using taints and tolerations, you can ensure that your workloads are only scheduled on nodes that meet specific criteria, which can help you avoid performance issues and resource contention.

Similarly, by using node affinity, you can ensure that your workloads are scheduled on nodes with the resources and characteristics required to run them effectively. For example, if you have a workload that requires GPUs, you can use node affinity to ensure that it is only scheduled on nodes with GPUs, which can help you avoid performance issues and reduce costs.

How to Use Taints, Tolerations, and Node Affinity in AKS?

To use taints, tolerations, and node affinity in AKS, you can define them in your Kubernetes manifest files. Adding taints, labels, or tags to nodes should be done for the entire node pool using az aks nodepool. Applying taints, labels, or tags to individual nodes in a node pool using kubectl is not recommended in AKS

In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. These node pools contain the underlying VMs that run your applications. Under the hood, Azure creates a VMSS scale set for each node pool.

Apply Taints and Labels to the Node Pool

To create a node pool with a taint, use az aks nodepool add. Specify the name taintnodepool and use the — node-taints parameter to specify sku=gpu:NoSchedule for the taint.

az aks nodepool add 
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name taintnodepool \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 3 \
--node-taints sku=gpu:NoSchedule \
--no-wait\

The above command will create a nodepool and ensure one node is always running. We have also enabled cluster autoscaler to make sure the nodepool is autoscaled to a maximum of 3 nodes if more pods are scheduled.

2. Apply toleration and node affinity to POD

affinity
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: sku
operator: In
values:
- gpu
tolerations:
- key: "sku"
operator: "Equal"
value: "gpu"
effect: "NoSchedule":

The above command will ensure that a pod will be only scheduled on a node tainted with key-value sku=gpu and also labeled with sku=gpu.

Tolerations are used to allow pods to be scheduled on nodes with matching taints, but they do not prevent pods from being scheduled on nodes without taints. If a pod has toleration for a specific taint, but there are no nodes in the cluster with that taint, the pod will still be scheduled on a node that does not have that taint.

In the absence of any taints, all pods can be scheduled on any node by default. Taints and tolerations are used to restrict scheduling to nodes that have specific characteristics or meet certain requirements.

To make sure our pods are always scheduled on this nodepool. we must add affinity with the same label. Hence we have added affinity in our POD specification.

Conclusion

Taints, tolerations, and node affinity are essential concepts to understand when working with AKS. By using these tools effectively, you can optimize the performance and reliability of your workloads and ensure that they are scheduled on nodes that meet specific criteria. If you’re not already using taints, tolerations, and node affinity in AKS, it’s worth exploring how these concepts can benefit your workloads.

--

--

huzefa qubbawala

Senior Architect @ CTO office Icertis | Problem Solver | Cloud Solution | Azure Kubernetes | Serverless | API Management | Cognitive Service | Applied AI