Cluster Autoscaler(CA) and Horizontal Pod Autoscaler(HPA) on Kubernetes

Sheikh Vazid
Tensult Blogs
Published in
5 min readAug 20, 2019

This Blog has moved from Medium to blogs.tensult.com. All the latest content will be available there. Subscribe to our newsletter to stay updated.

HPA and CA Architecture

Right now our kubernetes cluster and Application Load Balancer are ready. but we need to set up autoscaling methods on kubernetes cluster to successfully running your infrastructure on AWS cloud.

Part -3: Horizontal Pod Autoscaler and Cluster Autoscaler

Horizontal Pod Autoscaler

Autoscaling at pod level this includes the Horizontal Pod Autoscaler (HPA). It scales the pods in a deployment or replica set. It is implemented as a K8s API resource and a controller. The controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. It obtains the metrics from either the resource metrics API (for per-pod resource metrics), or the custom metrics API (for all other metrics).

Cluster Autoscaler

Autoscaling at the Cluster level, The Cluster Autoscaler (CA) manages scalability by scaling the number of nodes inside your Cluster.

Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:

  • there are pods that failed to run in the cluster due to insufficient resources,
  • there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.

The cluster autoscaler on AWS scales worker nodes within any specified autoscaling group. It will run as a Deployment in your cluster.

Deploying Metrics Server

Deploy a Metrics Server so that HPA can scale Pods in a deployment based on CPU/memory data provided by an API (as described above). The metrics.k8s.io API is usually provided by the metrics-server (which collects the CPU and memory metrics from the Summary API, as exposed by Kubelet on each node).

helm install stable/metrics-server \
--set rbac.create=true \
--set args[0]="--kubelet-insecure-tls=true" \
--set args[1]="--kubelet-preferred-address-types=InternalIP" \
--set args[2]="--v=2" \
--name metrics-server

Horizontal Pod Autoscaler

The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods controlled by the PHP-apache deployment we created in the first step of these instructions. Roughly speaking, HPA will increase and decrease the number of replicas (via the deployment) to maintain an average CPU utilization across all Pods of 50% (since each pod requests 200 milli-cores by kubectl run, this means average CPU usage of 100 milli-cores).

Confirm the Metrics API is available

kubectl get apiservice v1beta1.metrics.k8s.io -o yaml

If all is well, you should see a status message similar to the one below in the response. Then it's working fine

status:
conditions:
- lastTransitionTime: "2019-08-20T09:33:01Z"
message: all checks passed
reason: Passed
status: "True"
type: Available

Now we will scale a deployed application

Deploy a sample app and Create HPA resources

We will deploy an application and expose as a service on TCP port 80. The application is a custom-built image based on the php-apache image. The index.php page performs calculations to generate CPU load

kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80

Create an HPA resource

This HPA scales up when CPU exceeds 50% of the allocated container resource.

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

View the HPA using kubectl. You probably will see <unknown>/50% for 1-2 minutes and then you should be able to see 0%/50%

kubectl get hpa

Increase the load by hitting the App K8S service from several locations.

kubectl run -i --tty load-generator --image=busybox /bin/sh

Execute a while loop to continue getting http:///php-apache

while true; do wget -q -O - http://php-apache; done

The HPA should now start to scale the number of Pods in the deployment as the load increases. This scaling takes place according to what is specified in the HPA resources. At some point, the new Pods fall into a ‘pending state’ while waiting for extra resources.

Within a minute or so, we should see the higher CPU load by executing:

kubectl get hpa -w

Here, CPU consumption has increased to the request. As a result, the deployment was resized to replicas:

kubectl get deployment php-apache

You will see HPA scale the pods from 1 up to our configured maximum (10) until the CPU average is below our target (50%)

Configure Cluster Autoscaler

Cluster Autoscaler for AWS provides integration with Auto Scaling groups. It enables users to choose from four different options of deployment

We need to Add/Edit Auto Scaling Group Tags window, enter the following tags by replacing awsExampleClusterName with the name of your EKS cluster.

Key: k8s.io/cluster-autoscaler/enabledKey: k8s.io/cluster-autoscaler/awsExampleClusterName

The worker running the cluster autoscaler will need access to certain resources and actions. We need to attach IAM policy to the node group and avoid using AWS credentials directly unless you have special requirements.

Here we need to create an IAM policy called ClusterAutoScaler based on the following example to give the worker node running the Cluster Autoscaler access to required resources and actions.

{“Version”: “2012–10–17”,“Statement”: [{“Effect”: “Allow”,“Action”: [“autoscaling:DescribeAutoScalingGroups”,“autoscaling:DescribeAutoScalingInstances”,“autoscaling:DescribeLaunchConfigurations”,“autoscaling:DescribeTags”,“autoscaling:SetDesiredCapacity”,“autoscaling:TerminateInstanceInAutoScalingGroup”],“Resource”: “*”}]}

After adding the tag and IAM.

We will run helm chart for creating Cluster Auto scaler

helm install stable/cluster-autoscaler \
--name <release-name> \
--set awsRegion=<region> \
--set sslCertHostPath=/etc/ssl/certs/ca-bundle.crt \
--set autoDiscovery.clusterName=<cluster-name> \
--set rbac.create=true \
--set extraArgs.scale-down-enabled=true

To checks logs

kubectl logs <pod-name> --tail=50

Here we can see the Autoscaling working and its scaling up and down the worker node in kubernetes cluster.

Test Cluster Autoscaler

To see the current number of worker nodes, run the following command:

kubectl get nodes

To increase the number of worker nodes, run the following commands:

kubectl create deployment demo --image=nginxkubectl scale deployment demo --replicas=50

This will create an NGINX image directly on the Kubernetes cluster, and then launches 50 pods

When the number of available pods equals 50, check the number of worker nodes is increasing

To view nodes

kubectl get nodes -w

Delete NGINX deployment

kubectl delete deployment demo

Conclusion

Here we have successfully deploy autoscaling methods for kubernetes on AWS cloud. Now we will work on Kubernetes Dashboard on AWS EKS.

https://blogs.tensult.com/2019/08/20/cluster-autoscalerca-and-horizontal-pod-autoscalerhpa-on-kubernetes/

Reference

--

--

Sheikh Vazid
Tensult Blogs

I’m a Data engineer | learning, writing about data engineering