EXPEDIA GROUP TECHNOLOGY — SOFTWARE
Autoscaling in Kubernetes: A Primer on Autoscaling
First of a three-part series exploring application autoscaling in Kubernetes
I will look at some key drivers for autoscaling, and by the end of this article, you will be able to create a set of acceptance criteria to evaluate the suitability of any autoscaling solution, including the ones designed for Kubernetes.
Life without autoscaling
Consider a Kubernetes service with 2 replicas, each capable of handling a maximum of 600 requests/sec. The service distributes the load evenly* across its replicas so in total can handle 1200 requests/sec.
* Even load distribution is rarely a given in any system. Kubernetes is no exception. If you are interested in learning more about the factors that influence load distribution in Kubernetes, read through this excellent article by Vinod Canumalla
Case 1: peak workload < maximum available capacity
Take the example where the peak workload on the service is 1000 requests/sec.
Maximum available capacity for the service = 2 * 600 requests/sec
= 1200 requests/sec
With the peak workload < the maximum available capacity, the state of the service can be illustrated as shown below.
With even load distribution, load per replica (500 requests/sec)is less than the maximum capacity per replica (600 requests/sec). So, the service is able to handle the entire workload without any distress.
Case 2: peak workload > maximum available capacity
In the same example, let’s assume that a market event leads to an increase in the peak workload from 1000 requests/sec to 1500 requests/sec. In this case:
Peak workload (1500 requests/sec) > Maximum available capacity of the service (1200 requests/sec)
Under these conditions, the following image best illustrates the state of the service.
As illustrated above, the service is subject to more load (750 requests/sec, per replica) than it can handle (600 requests/sec, per replica). This leads to a good portion (20%) of the customer requests experiencing delays at best and completely failing at worst — an undesirable outcome.
Mitigating scalability failures
Without an autoscaling solution in place, the traditional approach to mitigating such scalability failures involves:
- an alert (on degradation/failures)
- intervention by a human operator
- root cause analysis
- scaling out the number of replicas
This approach does work but has the following problems:
- There is a likely delay between the alert and the intervention. Even if the operator is on call 24x7, it can take some time to intervene — for example, the operator might have to log in to production to intervene or he/she just went to get a cup of coffee
- Scalability failure is not the only risk to systems reliability. This means a human operator, more often than not needs to do a root cause analysis, however brief it is, to identify and understand the cause of failures. This further delays any action
- And finally, it is unlikely that human operators are observing a single service that they know everything about. It is more likely that they are monitoring everything. So, when a service endures scalability failure, the human operator needs to get information on the service, like peak capacity and current load before calculating the required number of replicas to handle the current load.
Number of replicas required = current load / peak capacity
This process assumes several things like that the capacity of each service is documented, updated regularly, and is readily available to the operators handling scalability failures — all of which are possible risks to the entire process, not considering the human errors in calculation and/or operation.
A manual scaling approach is not only slow but also error-prone.
How can autoscaling help?
With autoscaling, the role of the human operator is taken by a (set of) software component(s), the autoscaler.
In this case, the autoscaler monitors the current level of usage, calculates additional scalability requirements based on the codified capacity, and finally scales out the number of replicas to handle the additional workload.
The benefits of this are:
- The peak capacity of services tends to be codified instead of documented. As with most things in code, this tends to be reviewed regularly and is less likely to be outdated, as compared to stand-alone documentation.
- Unlike humans, the autoscaler does not need a coffee break. It is expected to be always on the spot to respond to any scalability triggers.
- The autoscaler concerns itself with one task and one task only (i.e.) respond to scalability triggers. For example, if the autoscaler is configured to scale in response to CPU usage and the target utilization exceeds the configured threshold, the autoscaler scales out the number of replicas. Because of the focus on a single concern, there is no delay due to root cause analysis to delay the autoscaler’s response.
- In addition to the above, because of the low operational overhead of the entire process — of monitoring, decision making, and scaling, autoscaling can be useful not just during unexpected workload increases but can also help reduce the infrastructure cost, even under a steady workload. Figures below illustrate the cost benefits of autoscaling.
As illustrated in the above figure, even with a steadily increasing/decreasing workload, autoscaling can help operate more efficiently by scaling in and out as required. Without autoscaling, the service needs to be provisioned for the peak workload expected. In the above example, 4 pods will be provisioned all the time, even when the actual load can be handled with 1 pod (9:00–10:30 AM, for example)
Note: In the above example, the pods are configured to operate at an average CPU usage of 80%. If the average CPU usage across all pods goes above 80%, additional pod(s) will be spun up by the autoscaler. Hence, at 10:30 AM, even though the CPU usage across pods is <100%, an additional pod comes up because the target usage threshold of 80% has been crossed.
An autoscaling solution can make scaling faster, less error-prone and more cost-efficient
Acceptance criteria for any autoscaling solution
As detailed so far in this blog post, there are several benefits to using autoscaling as a scalability solution, as against over-provisioning or manual scaling. The corollary of this is that a good autoscaling solution should be able to deliver the following benefits:
- Reliability — Must guarantee scalability
- Efficiency — Must reduce the infrastructure cost, as against over-provisioning/manual scaling
- Responsiveness — Must scale-out fast enough to successfully handle an increase in workload
- Resilience— Must protect against malicious traffic (To be elaborated on, in the subsequent posts)
These can act as the acceptance criteria to drive and evaluate the Kubernetes based autoscaling solutions that will be the focus of further posts in this series.