K8s Pod Anti-affinity

Daniel Weiskopf
5 min readAug 19, 2022

--

How to ensure high availability when scheduling pods on a kubernetes cluster.

The Problem

Let’s talk about a very common situation that arises when scheduling pods on a kubernetes cluster.

In case you aren’t familiar, its the job of the kube-scheduler, a k8s control plane component, to assign pods to the various nodes that make up your cluster. The scheduler takes various details into consideration, such as resource requests and node availability.

But let’s say we have a situation where you have multiple applications running on your cluster in a microservice architecture and your cluster is comprised of multiple nodes that host your application pods. While the scheduler will ensure all of your pods get places on healthy nodes with sufficient resources, the scheduler will not take in to account which pods get placed on which nodes based on the application running in the pod. This means you could end up in a situation where all your pods from a particular deployment are all placed on the same node. The blaring downside of such a situation is that it puts that application in risk of failure in case that node goes down.

What we really want is for the pods to be spread out among all the nodes as to increase the size of the point of failure. It’s less likely for multiple nodes to go down all at once than for one node to go down. — We are going to use pod anti-affinity to direct the scheduler to place our pods with a strategy for high availability.

Node and Pod Affinity and Anti-affinity

In the world on kubernetes, affinity refers to the priorities we set for the scheduler on where we want our pods placed. There are two main types of affinity:

  1. Node affinity
  2. Pod affinity

Node affinity allows us to direct the scheduler based on the attributes on the nodes. For instance we can direct the scheduler to only place certain pods on nodes with a specific label or nodes in a certain availability zone. [This can be coupled with taints and tolerations to ensure pods only run on the nodes we dictate.] This works well when we have different types of nodes that are designed to host certain pods, for instance if we have nodes with larger instance types to host larger applications or nodes with attached volumes.

The other type of affinity, and the one we are going to use for our use case, is pod affinity. Pod affinity allows us to set priorities for which nodes to place our pods based off the attributes of other pods running on those nodes. This works well for grouping pods together in the same node.

Pod anti-affinity allows us to accomplish the opposite, ensuring certain pods don’t run on the same node as other pods. We are going to use this to make sure our pods that run the same application are spread among multiple nodes. To do this, we will tell the scheduler to not place a pod with a particular label onto a node that contains a pod with the same label.

Let’s take a look at an example of a manifest that uses pod anti-affinity.

As you can see, we have a pod with a label app: green . In our affinity instructions, we specify a podAntiAffinity that tells the scheduler to not place this pod on a node with an existing pod with the label app: green . This ensures that if we will have the pods spread over multiple nodes.

Preferred vs. Required

There is one more thing we need to discuss and that is the difference between preferredDuringSchedulingIgnoredDuringExecution and requiredDuringSchedulingIgnoredDuringExecution .

Preferred means that the scheduler will try to place the pod according to the affinity rule, but if it is unable to find a suitable node, it will place the pod anyway. Required means that if it can’t find a suitable node, it will not schedule the pod and the deployment will not reach full capacity.

This is important for our use case in a situation where there are more pods in the replica set than there are nodes in the cluster, for instance, if you have a replica set of 5 pods, but there are only 3 nodes in the cluster. In such a situation, as long as the pods are spread among the 3 nodes, we don’t mind if two of the pods are colocated on the same node. If we are not careful, we could prevent the scheduler from placing the full replica set of pods in the cluster because it can’t place the 5 pods on 5 separate nodes. To prevent this, we must be careful to specify that our affinity rule is preferred and not required, this way the scheduler will still make sure to place all the pods in the replica set.

Closing Thoughts

I wanted to create this post because this is a very common situation I run into when working this kubernetes. Although it may not be the highest risk or top priority when setting up a cluster, I believe it is a good practice to set pod anti affinity to ensure high availability. Sometimes nodes fail and having all of our application pods on a single node can produce a single point of failure even when working with a container orchestrator like kubernetes. I hope this article was informative and helps increase the availability of your k8s architecture. Thanks for reading!

--

--

Responses (2)