Fine tuning a Kubernetes cluster

Published in

Google Cloud - Community

3 min readDec 30, 2018

As we all know the API server, scheduler, controller manager and kubelet components form the backbone of a Kubernetes cluster, here we are going to see how to tune these components according to our needs with example cases.

A man tuning a vacuum tube TRF, 1925 (source: https://commons.wikimedia.org/wiki/File:Tuning_a_TRF_radio_1925.jpg)

Clusters created with Kubeadm

Source: https://image.slidesharecdn.com/2-kubernetes-training-171016154824/95/orchestrating-microservices-with-kubernetes-37-638.jpg?cb=1508173523

Clusters provisioned with kubeadm has kubelet implemented as systemd service and kube-scheduler, kube-apiserver, kube-controller-manager implemented as docker containers loading their configurations from /etc/kubernetes/manifests/ from the host system. Any updates done to these configs will have an immediate effect on the cluster, no restarts needed.

Necessity for tuning a Kubenetes cluster

A kubernetes cluster with its default set of config values suffices almost all the normal user needs. In normal situations you do not have to tune Kubernetes configuration parameters. However, there are some cases when you need them, they are discussed below.

Case - 1: Faster detection/response on node failure

Here is a use case where default config set is not enough and there is a need for tuning the cluster. This is when running/testing Kubernetes node failures with a Raspberry Pi cluster.

As mentioned by @v1ktoor in this post, the kubelet detects the node’s unavailability after 40th second (node-update-status-frequency x (N–1)) after which the kube-controller-manager waits 5 minutes (node-eviction-timeout) before it starts evicting the pods. This means, it takes 340 seconds (40s + 5m) before taking any action after the node had failed. This can be tweeked to react faster by lowering the default values. E.g. Setting the following values to the following in Raspi kubernetes cluster:

kubelet: node-status-update-frequency=4s (from 10s)
kube-controller-manager: node-monitor-period=2s (from 5s)
kube-controller-manager: node-monitor-grace-period=16s (from 40s)
kube-controller-manager: pod-eviction-timeout=30s (from 5m)

Ensure the formula: pod-eviction-timeout > (node-status-update-frequency x (N-1) == node-monitor-grace-period) is satisfied. After modifying the above values, the controller-manager started to evict pods in half a minute after the failure had occurred (unplug a random RaspberryPi from power).

Case - 2: Max pods per node exceeds 110

If you have a big cluster or if you are trying to test with more number of pods, you will soon start to realise that, there is a hard limit of 110 pods per node. You get error like this:

The above is from a two node+master cluster created with kubeadm. This value is configured as maxPods in /var/lib/kubelet/config.yaml and passed to kubelet as --config as KUBELET_CONFIG_ARGS. This could be tuned to a desired value and kubelet restarted to take effect.

More configuration parameters

Apart from the above, a full reference of all the kubernetes components are documented here:

Kubernetes api server — https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
Kubernetes controller manager — https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/
Kubernetes scheduler — https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
Kubelet — https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

Ensure that, these services are restarted after the modifications.

Cloud hosted solutions — limitations

Almost all the major cloud vendors provide managed/hosted kubernetes solutions. In that case, the user does not have to care about the maintenance of the cluster, the drawback being, there is no possibility for accessing/tweaking the config params. However, the whole idea of the hosted solutions is not having to care about the complexity of kubernetes.

Conclusion

As the community of the kubernetes and its uses are taken into different fields, its hard to get a default configs which fits all. Sometimes you need to modify the config parameters for your needs. However, its very good that the community had managed to maintain a large amount of configuration parameters which can be modified outside of the source code.

References

Post about detecting quicker node failure — https://fatalfailure.wordpress.com/2016/06/10/improving-kubernetes-reliability-quicker-detection-of-a-node-down/