Kubernetes Tip: How To Make Kubernetes React Faster When Nodes Fail?
We understand What happens to pods when nodes fail? but would also want the Kubernetes system to react faster when nodes fail to make the system robust in terms of availability and reliability.
Let’s understand the flow of events and learn about related timers before tweaking them.
Events Observed When Node Becomes Unhealthy.
Derived Retry Parameter.
node-status-update-frequency is a kubelet configuration while node-monitor-grace-period is a controller manager configuration. Combination of both works as a retry parameter. If you want the node to try N times before declaring a node unhealthy, set the following.
node-monitor-grace-period = (N — 1) * node-status-update-frequency.
The default values are set in such a way that the node’s kubelet tries 5 times before declaring a node as unhealthy.
pod-eviction-timeout is a controller manager configuration that acts as a grace period for deleting pods from nodes. By default, set to 5 mins, probably too high a value.
node-monitor-period is also a controller manager configuration that wakes up controller manager to check the status of nodes. For the system to work deterministically, this parameter needs to be both less than and multiple of node-monitor-grace-period.
Failure and Recovery Time.
After a node fails, the Kubernetes system takes a total of node-monitor-grace-period + pod-eviction-timeout to get back to steady-state. It’s 5 minutes and 40 seconds for the default values.
In those 5 minutes and 40 seconds, the amount and kind of failures depend on the availability of the pod on the unhealthy node. If the pod is down, there will be no response for requests directed towards those pods otherwise, the pod would keep servicing its clients without problems.
One could modify these parameters to following for the Kubernetes system to react faster to failures. These numbers would reduce the reaction time from 5 mins 40 secs to 36 secs.
kubelet’s: node-status-update-frequency=4s (from 10secs)
controller-manager: node-monitor-period=4s (from 5secs)
controller-manager: node-monitor-grace-period=16s (from 40secs)
controller-manager: pod-eviction-timeout=20s (from 5mins)
Can we modify these on Managed Services?
Azure’s AKS provides hooks to modify few controller manager parameters while I did not find any ways to change these on Google’s GKE or Amazon’s EKS. If you know any WRT GKE & EKS, pls leave a comment.
It may be quite tempting to lower the values of discussed parameters for the Kubernetes system to react faster but may end having a cascading effect when many nodes having many pods start failing at the same time. Tailwinds recommend to define error budgets based on SLOs and rework backward to arrive at a likely time for the system to detect and react to pod failure.
P.S: Thanks To Aaron For Suggesting To Write This Post.