Does your application kill AWS EKS worker node? — Set Node Allocatables
This article assumes that you have a basic understanding of AWS EKS service or kubernetes in general — what it is, why do we need it, etc. If you are new to kubernetes or EKS service, you can refer official AWS documentation here.
In this blog, we are going to discuss the eks worker node stability issue, what are the factors that affect nodes stability and how that can be taken care by correctly configuring Node Allocatables values in the worker nodes configuration. While the configuration that is shown here is specific to EKS worker nodes but in general Node Allocatables are applied to all the kubernetes clusters.
By default, kubernetes nodes can be scheduled to a capacity. Pods can consume all the available capacity on a node by default. This is an issue because nodes typically run quite a few system daemons that power the OS and Kubernetes itself. Unless resources are set aside for these system daemons, pods compete for resources and lead to resource starvation. This usually happens when worker node is running at close to capacity which can potentially lead the worker node to disjoin the cluster.
Set Node Allocatables
We should have some ability to provide more reliable scheduling and minimize node resource overcommitment. Fortunately, kubelet (A service that runs on each kubernetes worker node) exposes a feature named Node Allocatable. Allocatable on a Kubernetes node is defined as the amount of compute resources (MEM, CPU, ephemeral-storage) that are available for pods by reserving system resources to system’s services and services that power kubernetes cluster itself. In short, the scheduler treats Allocatable as the available capacity for pods.
The following picture should depict the relationship between Node Capacity and Allocatable.
While Node Allocatable can be set to all kinds of kubernetes cluster nodes but in this tutorial, we are going to cover the configuration specific to EKS cluster.
Configure Node allocatable for EKS worker nodes
AWS provides a bootstrap script that runs on each worker node during its initial boot and helps the worker node register itself with the EKS cluster.
#!/bin/bash -xesudo /etc/eks/bootstrap.sh --apiserver-endpoint 'CLUSTER-ENDPOINT' --b64-cluster-ca 'CERTIFICATE_AUTHORITY_DATA' 'CLUSTER_NAME'
By default, the bootstrap script mentioned above is a part of official AWS EKS worker node AMIs. The above configuration is the bare minimum you need for successfully provisioning the worker nodes.
We can get the following data — CLUSTER-ENDPOINT, CERTIFICATE_AUTHORITY_DATA and CLUSTER-NAME from AWS EKS cluster console.
Update bootstrap script for EKS worker node
Add — kubelet-extra-args to bootstrap script to set Allocatable.
#!/bin/bash -xesudo /etc/eks/bootstrap.sh --apiserver-endpoint 'CLUSTER-ENDPOINT' --b64-cluster-ca 'CERTIFICATE_AUTHORITY_DATA' 'CLUSTER_NAME' \
--kubelet-extra-args "--kube-reserved cpu=500m,memory=1Gi,ephemeral-storage=1Gi --system-reserved cpu=500m,memory=1Gi,ephemeral-storage=1Gi --eviction-hard \
Once worker nodes are launched with the above changes, Allocatablesshould reflect in the worker nodes configuration as shown in the snapshot given below.
Capacity:cpu: 4ephemeral-storage: 104845292Kihugepages-1Gi: 0hugepages-2Mi: 0memory: 16424204Kipods: 44Allocatable:cpu: 3ephemeral-storage: 92330453652hugepages-1Gi: 0hugepages-2Mi: 0memory: 13815052Ki
For example, t2.xlarge node (4 Cores, 16 GB Memory) was used for this configuration. After reserving 500m CPU for both kube-reserved and system-served the available CPU left for PODs is 3 Cores. You can verify this from the screenshot given above. This output is available via the following command.
kubectl describe node <NODE_NAME>
We have been using EKS cluster in production for quite some time now and configuring allocatables has been very effective in avoiding node termination due to inadequate resource availability. In this article, we have discussed EKS worker nodes Allocatables configuration and why is it must have configuration from worker nodes stability point of view.
Please let us know your feedback in the comments below also if you have liked the article don’t forget to clap.