Ensure the right amount of resource utilization in the cloud, Auto-Scaling

4 min readJun 20, 2019

When it comes to starting of IT-related business or creating and deploying applications the first problem comes to the mind is about physical resources.

size of load it will cater?
how many servers?
how much will they cost?

But to decide the number of physical resources, we must know the actual load needs to be served?

So we have to come up with estimates for minimum, maximum, the average amount of load size.

In traditional, there are two ways to plan for the changes in capacity.

Add enough servers considering the maximum amount of capacity needed. This will guarantee High Availability. So that servers won’t fail even if the highest amount of traffic arrived. But this way there won’t be a better resource utilization. Most of the time some resources may stay idle. So that it seems Waste of Resources
Add an average number or a little bit high than the average number of servers, so that you can have better resource utilization than before. Resources will be idle only when there is less traffic than expected. But this won’t be able to cater to the maximum needed capacity. So if there is more traffic than average, availability is not guaranteed. So there is a risk of less availability

But what we actually looking for is optimization for Availability and Resource Utilization. This is where the concept ‘Auto-Scaling’ appears.

Auto-Scaling

Auto-Scaling refers to monitoring application usage and dynamically scaling in, and scaling out depending on the needed capacity.

If you have idle instances, it will automatically terminate those instances.
If your resources are highly utilized and seem traffic is increasing, so this will automatically launch more instances. So that application will have high availability.

So basically this will ensure that correct number of resources/instances available.

This is a concept used in cloud computing because in cloud computing there is a farm of computing resources, which will give help us to launch, terminate, allocate, re-allocate instances easily.

How Auto-Scaling works?

Auto-Scaling group: A collection of instances subject to auto-scaling.

We first have to decide the minimum and the maximum number of instances our application would need. So depending on that we can assign an auto-scaling group to our application.

So additional instances which were added to the auto-scaling group will become active whenever necessary.

To redirect load between these instances, auto-scaling groups use a load balancer.

Now you may wonder how does this auto-scaler identifies your application needs more instances or not. To do that auto-scaler performs a health check on attached instances to see they are functioning properly. It checks whether these instances are still registered and in service with the associated load balancer.

Sometime there will be situations where the number of instances in these group will become insufficient or all the instance become idle. So there you may need to change the boundaries of the auto-scaling group.

So auto-scaler can scale in or scale out,

automatically if you have already defined auto-scaling policies.
manually
by schedules: can schedule time/season to automatically scale out

Auto-Scaling Policies: Specifies changes to auto-scaling groups desired capacity using scaling out and scaling in policies.

Auto-Scaling Approaches

Auto-scaling normally happens in response to real-time changes. But that approach won’t work well if there is a spike of the load. To adhere to such cases, auto-scaling has the following options.

Predictive auto-scaling: here auto-scaler uses predictive analysis based on recent usage trends, historical data to predict future usage.

Scheduled auto-scaling: scheduling to auto-scale on a specific time period.

Benefits of Auto-Scaling

As we have discussed above two main benefits of Auto-Scaling are Better availability and Cost effective.

Better availability: auto-scaling ensures that your application always has the right amount of resources to manage incoming traffic gracefully.
Cost-effective: auto-scaling always ensures you have only the resource amount you need. no idle resources. so no need to spend to manage the unwanted amount of resources.
Fault tolerance: auto-scaling can detect when an instance is unhealthy, and it will automatically launch an instance to replace the unhealthy instance.

Implementing Auto-Scaling

There are two ways of implementing auto-scaling.

Horizontal pod auto-scaling (HPA): This refers to increasing pods in deployment. So that auto-scale group size remains unchanged.
Cluster auto-scaling (CA): This refers to increasing auto-scaling group size.

Conclusion

In summary, Auto-scaling is an optimization between utilization and availability, which is available in cloud computing. You have to choose an appropriate auto scaling group depending on the expected load. This will help most of the business owners to overcome issues of less resource utilization, server-side failures.