Cluster Autoscaler for higher scalability ( K8s Part 2)
In this fast moving world we need higher availability and scalability to serve growing traffic of a D2C brand. AWS cluster autoscaler brings the solution for scaling of nodes for k8s pods requirement. CA is a feature which runs inside your k8s cluster as a deployment and with help of your HPA (horizontal pod autoscaler) it regulates scale-in and scale-out of cluster nodes.
Deploying cluster autoscaler is quite easy, you need to fill in a few details in the chart or yawls (based on your mode of execution) like cluster name, operating region, autoscaling group names associated with each node group with their respective minimum and maximum scaling capacity.
You can download sample CA yaml from its official GitHub repo and make necessary changes as needed.
Find and make changes as shown, you can always make changes necessary for your application.
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.2*** # make sure it matches you cluster versionname: cluster-autoscalerresources:limits:cpu: 100m # based on your application sizememory: 100Mi # based on your application sizerequests:cpu: 100m # based on your application sizememory: 100Mi # based on your application sizecommand:- ./cluster-autoscaler- --v=4- --stderrthreshold=info- --cloud-provider=aws- --skip-nodes-with-local-storage=false- --expander=priority- —nodes=1:20:sample-node-group-1- —nodes=1:20:sample-node-group-1- —scale-down-unneeded-time=1m0s # cool down for scaling out- --scale-down-unready-time=2m0s- —scale-down-delay-after-add=2m0s # A delay in case you fast moving scale-in and scale-out- --scale-down-utilization-threshold=0.7- --balance-similar-node-groups- —max-total-unready-percentage=50 # Not more than 50% of pods will be in unready state- --ok-total-unready-count=20- --max-empty-bulk-delete=30- —max-node-provision-time=5m0s # waits for 5 minutes for provisioningenv:- name: AWS_REGIONvalue: #your region
Instead of writing hardcoded names of ASG, it is preferred to use auto discovery while maintaining a large number of node groups in eks cluster.
To enable auto discovery, use the --node-group-auto-discovery flag in which your auto scaling group must be tagged with unique key value pair
For example, --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<cluster-name>,custom-tag=value
This will find the auto scaling group with specified custom tag and value, and attach it to your eks cluster. This will make launching new node groups in clusters easier and faster.
Once you make edits, deploy you CA with its priority expander config map
apiVersion: v1kind: ConfigMapmetadata:name: cluster-autoscaler-priority-expandernamespace: kube-systemdata:priorities: |-20:- *spot* # can define regex based on you nodegroup nomenclature10:- *ondemand* # can define regex based on you nodegroup nomenclature
To obtain safe fall back when you are using spot fleet with you eks cluster. Please make sure you have created on demand managed node groups and added it on lower priority in your config map, like illustrated in above map.
The config map accepts regex expressions. You can add more priorities as per the need of your application.
Also, CA will need proper permissions of controlling ASG and Instances. If your CA fails or gets stuck in crashback, please look into the logs for finding the trace
Execute your yamls and Voila!! Your CA is ready to rock !!
For testing it out you can perform a load test as per your traffic requirements using any tool available
It is observed that many times node group registration with eks cluster faces challenges.
Just in case you missed part 1 of this article, here you go -
More in the next article, till then you can refer to the below