Overcoming Node failure for pods which can run only one replica on K8s
Published in
Feb 11, 2022
Photo by Ian Taylor on Unsplash
When a node goes offline, all pods on that node are terminated and new ones spun up on a healthy node.
The default time that it takes from a node being reported as not-ready to the pods being moved is 5 minutes.
This really isn’t a problem if you have multiple pods running under a single deployment. But sometimes we come across such an application which can run only one instance.The pods on the healthy nodes will handle any requests made whilst the pod(s) on the downed node are waiting to be moved.
The simplest way to adjust this is to add the following tolerations to your deployment.