Overcoming Node failure for pods which can run only one replica on K8s

Apoorv Anand

Published in

Developerworld

Feb 11, 2022

Photo by Ian Taylor on Unsplash

When a node goes offline, all pods on that node are terminated and new ones spun up on a healthy node.

The default time that it takes from a node being reported as not-ready to the pods being moved is 5 minutes.

This really isn’t a problem if you have multiple pods running under a single deployment. But sometimes we come across such an application which can run only one instance.The pods on the healthy nodes will handle any requests made whilst the pod(s) on the downed node are waiting to be moved.

The simplest way to adjust this is to add the following tolerations to your deployment.

Overcoming Node failure for pods which can run only one replica on K8s

Written by Apoorv Anand