Traffic-Based Autoscaling for GKE Clusters based on Request per Second (RPS) by using Internal Gateway Controller
A workaround for Google Kubernetes Engine (GKE) autoscaling by using Internal Gateway Controller to produce RPS Custom Metric for the HPA
Preface
To talk about autoscaling in Kubernetes, we may refer to the feature of Horizontal Pod Autoscaler (HPA) or Vertical Pod Autoscaler (VPA). But in this article, we will focus on the HPA. HPA needs metrics in determining the number of replica deployments, but by default, it only exists metrics namely CPU and memory as a reference for the HPA. However, these metrics are not representative for applications where the metric utilization is not tied to CPU or memory. Then there is Request Per Second (RPS), a metric that is very useful for the needs of an application in determining traffic inbound to the application. RPS can be implemented using traffic-based monitoring solutions (e.g., service mesh, gateway, load balancer).
Previously, I was exploring the utilization of Istio, Prometheus, and Prometheus adapter to implement traffic-based autoscaling in Kubernetes based on Service Mesh Architecture. But this seems exhausting and I found too much complexity when I had finished my exploration. Istio will implement a service mesh architecture in Kubernetes using Istio sidecar (Envoy) beside every Pod we’ve deployed and Prometheus also needs more resources to deploy its server in Kubernetes. It clearly needs a bigger cluster and raises our cost estimation in Google Cloud Platform or any cloud provider. We conclude that the solution using Service Mesh is exhausting and not cost-friendly. So we have to move to another solution, and there is an alternative solution using GKE Gateway.
In this article, we will:
- Deploy Gateway to implement traffic-based autoscaling.
- Tests the functionality of the Gateway to forward traffic to the service.
- Perform the usability of RPS as a custom metric for the HPA.
- Load testing to see the functionality of the HPA.
Prerequisites
Requirements
- GKE Cluster with versions 1.24 and later.
- Gateway API enabled in the GKE Cluster.
- Use one of these supported single-cluster GatewayClasses (
gke-l7-rilb,gke-l7-gxlb,gke-l7-global-external-managed).
Limitations
- Not supported by the multi-cluster GatewayClasses (
gke-l7-rilb-mc,gke-l7-gxlb-mc, andgke-l7-global-external-managed-mc).
Read more about Google Kubernetes Engine (GKE) Gateway API in this page Gateway | Google Kubernetes Engine (GKE) | Google Cloud.
Hands-On!
To meet the needs of points 1 and 2 on prerequisites, do the following steps.
Enable Gateway API in a GKE Cluster
If you don’t have any GKE Cluster inside your GCP, you can simply create a cluster and enable the Gateway API in its cluster by using the following command.
gcloud container clusters create my-gke-cluster \
--gateway-api=standard \
--cluster-version=1.24.7-gke.900 \
--region=us-central1-bThe above command will create a cluster with the name
my-gke-cluster, enable the API Gateway feature, use the version of1.24.7-gke.900, and the region of the cluster is inus-central1-b.
Or, you can also modify your existing cluster by using this command.
gcloud container clusters update my-gke-cluster \
--gateway-api=standard \
--region=us-central1-bWait for a minute, your cluster will have GatewayClasses. Verify your process by using the following command.
kubectl get gatewayclassThe output of the command above must be similar to this.
Set up a proxy-only subnet
In this article, I want to use an internal gateway, so we will use an internal HTTPS load balancer.
The proxy-only subnet will be used by the internal HTTPS load balancer to handle the traffic that will be generated from the internal Gateway. The subnet will provide internal IP addresses to the load balancer proxies.
You don’t need to set up a proxy-only subnet if you want to use external HTTPS load balancer (
gke-l7-gxlb)
Create a proxy-only subnet using the following command.
gcloud compute networks subnets create proxy-only \
--purpose=REGIONAL_MANAGED_PROXY \
--role=ACTIVE \
--region=us-central1-b \
--network=default \
--range=172.25.1.0/24The above command will create a subnet named
proxy-only, the--purposeflag is set toREGIONAL_MANAGED_PROXYfor enabling proxy-only. Flag--rangeis the IP addresses range (CIDR), recommended subnet mask is /23, maximum is /26.
Verify your proxy-only subnet by using the following command.
gcloud compute networks subnets describe proxy-only \
--region=us-central1The output of the command above must be similar to this.
Create a Sample App
This is Kubernetes YAML manifest file to deploy a sample app using Google Samples Containers named whereami. This manifest file will also deploy its service to expose the application using internal IP because the field type is ClusterIP.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-app
spec:
replicas: 2
selector:
matchLabels:
app: sample-app
template:
metadata:
labels:
app: sample-app
spec:
containers:
- name: whereami
image: us-docker.pkg.dev/google-samples/containers/gke/whereami:v1.2.15
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: sample-app-svc
annotations:
networking.gke.io/max-rate-per-endpoint: "10"
spec:
ports:
- port: 8080
targetPort: 8080
name: http
selector:
app: sample-app
type: ClusterIPThe manifest file above will create a sample app deployment with 2 replicas. The sample app service needs annotations to define the maximum rate per endpoint (per pod) that will be used as a service capacity value to determine the capacity of RPS that is acceptably received per pod. Annotations networking.gke.io/max-rate-per-endpoint: “10” means that the service capacity of each pod is 10 RPS.
Read more about service capacity in this page Gateway traffic management | Google Kubernetes Engine (GKE) | Google Cloud
Create a Gateway Resource
In this step, we will deploy a Gateway resource in Kubernetes to meet requirement number 3.
The following line of configuration is a YAML manifest file to create a Gateway resource in Kubernetes.
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
name: sample-app-gateway
spec:
gatewayClassName: gke-l7-rilb
listeners:
- name: http
protocol: HTTP
port: 80The descriptions of the manifest file above are:
- The field
gatewayClassNamespecify the kind of GatewayClass used. GatewayClass for regional internal HTTPS load balancer isgke-l7-rilb. If you prefer to use an external HTTPS load balancer, usegke-l7-gxlb, it will provide a public IP address. - The field
portspecify that the Gateway exposes port 80 (HTTP) to listen to the HTTP requests.
Create an HTTPRoute
The following line of configuration is a YAML manifest file to create an HTTPRoute in Kubernetes.
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
name: sample-app-routing
labels:
gateway: sample-app-gateway
spec:
parentRefs:
- name: sample-app-gateway
rules:
- backendRefs:
- name: sample-app-svc
port: 8080The field gateway below metadata labels will route the HTTP/HTTPS traffic from the Internal Gateway sample-app to the Kubernetes services. The destination of Kubernetes service is set to service sample-app-svc.
Deploy All the Resources
You can easily deploy all the resources above by using the following file.
Download the file above and save it with the name gke-gateway-tutorial.yaml then follow the command below.
kubectl apply -f gke-gateway-tutorial.yamlTest the Sample App
To test the sample app, we need to know what is the IP address that is provided by the Gateway. Follow this command to export the IP address.
kubectl get gateway -o=jsonpath='{.items[?(@.metadata.name=="sample-app-gateway")].status.addresses[0].value}'In my output, the IP address is 172.20.1.37. This IP address may vary because it depends on the VPC network that you are used to. Since this IP address is provided by the Internal Gateway, you can only contact the sample app from the IP address in the same network.
To test this sample app, I use a Google Compute Engine (GCE) in the same VPC network as the Gateway. I ran a simple CURL to get the response from the sample app by hitting an HTTP request to the Gateway IP address http://172.20.1.37.
curl -X GET "http://172.20.1.37"The output should be similar to this.
Create an HPA
After the testing above, we can conclude that the Internal Gateway is working properly. Create an HPA by using the following manifest file.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: sample-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Object
object:
describedObject:
kind: Service
name: store-autoscale
metric:
name: "autoscaling.googleapis.com|gclb-capacity-utilization"
target:
averageValue: "50"
type: AverageValueThe HPA above will scale the sample app deployment by using custom metrics from the Internal Gateway. The metric name is “autoscaling.googleapis.com|gclb-capacity-utilization”, it represents the actual utilization of HTTP/HTTPS requests that we can say it as RPS (Request per Second). The target average value is 50%, which means the HPA will trigger if the actual RPS to each Pod is exceeding 50% of the maximum service capacity.
The calculation of the target utilization is described below.
Target RPS utilization = Target Percentage Utilization * Maximum Service Capacity
50% * 10 RPS = 5 RPS
Save the manifest file above with the name sample-app-hpa.yaml and run the following command.
kubectl apply -f sample-app-hpa.yamlGenerate Traffic using Artillery
We want to test the HPA to scale up using the traffic generator that will produce HTTP requests with a rate is 10 RPS. This value will trigger the HPA to scale up. We will use the HTTP performance testing tool called Artillery.
Read more about it and how to install it on this page Installing Artillery CLI | Artillery
To perform the HTTP traffic generator using Artillery, save this configuration file to a file named scenario.yaml. Update the target GATEWAY_IP_ADDRESS to your Gateway IP address.
config:
target: "http://GATEWAY_IP_ADDRESS"
phases:
- duration: 60
arrivalRate: 10
name: Initial Load
scenarios:
- name: "gateway-testing"
flow:
- get:
url: "/"
headers:
content-type: application/jsonThe Artillery configuration file above will generate a traffic with 10 RPS in 60 seconds duration. The destination of the HTTP request is the URL of “/” endpoint from the target.
Run the Artillery performance testing with this command.
artillery run scenario.yamlThe output should be similar to this.
HPA Scaling Result
The autoscaler will use the following equation to determine the number of replicas needed to handle the 10 RPS generated from the traffic generator. The number of replicas is fluctuating depending on traffic fluctuations.
Replicas = ceiling [ current traffic / ( averageUtilization * max-rate-per-endpoint) ]
This is the autoscaling behavior after we test the sample app with traffic generator 10 RPS.
From the image above, we can see that the HPA upscaled the replica from 1 to 3 replicas. The calculation is described below.
Replicas = ceiling [ 10 RPS / ( 0.5 * 10 RPS) ] = ceiling [ 2 ] = 2 replicas
The calculation above is the number of desired replicas, but in reality, it can be more than 2 replicas. In the image above, the replicas are 3 replicas. But the conclusion is each replica will receive 3–4 RPS, so it will be
less than equal to the target utilization which is 5 RPS. So the HPA will not trigger to upscale the replicas.
Let’s Go Further, 500 RPS Load Testing!
At the production level, our application may receive more than just 10 RPS. Let’s assume that the application is at the peak level and will receive a 500 RPS load of requests (e.g., during promotion/flash sale). Let’s try this!
Update the Sample Application Service atgke-gateway-tutorial.yaml:
metadata.annotations.networking.gke.io/max-rate-per-endpoint: "100"
Update the HPA specifications at sample-app-hpa.yaml:
spec.minReplicas: 5spec.maxReplicas: 30
Apply the changes we have made using the following commands.
kubectl apply -f sample-app-hpa.yaml
kubectl apply -f gke-gateway-tutorial.yamlUpdate the Artillery configuration at scenario.yaml:
config.target : GATEWAY_IP_ADDRESSconfig.phases[0].arrivalRate: 500
Run the Artillery load testing command using artillery run scenario.yaml and then see the result and the sample app replicas upscaling when receiving 500 RPS.
In the image above, the replicas in autoscaled to 10 Pods. Using the HPA formula, the theoretical calculation is described below.
Replicas = ceiling [ 500 RPS / ( 0.5 * 100 RPS) ] = ceiling [ 10 ] = 10 replicas
In the end, each pod will be estimated to receive 50 RPS actual traffic. This traffic will not trigger the HPA because the actual RPS is less than equal to the target utilization.
Conclusion
There were so many different approaches to implementing traffic-based autoscaling, e.g., using Istio, Prometheus, and Prometheus adapter to produce service mesh architecture, but this solution is more complex and difficult. It needs a bigger cluster to be implemented, this will make your available resource in the cluster limited and will raise your cost estimation for your GKE Cluster.
We have successfully done all the practices to implement traffic-based autoscaling with GKE Gateway. GKE Gateway provides us with its built-in feature and simplifies the process to implement traffic-based autoscaling. This solution is also more cost-friendly because it will just need a smaller cluster. Based on my personal experience, the HPA scaling behavior when using the GKE Gateway solution is more real-time than the Service Mesh architecture using Istio. So it will be more stable when we talk about response time stability. And also, the response time between using the GKE gateway solution is better around 5–10 ms than using the Mesh Service (Istio).
GKE Gateway provides us the advantages of smaller cluster, cost-friendly, better response time, and easier implementation.
Enjoy! Feel free to leave a comment or discuss. Ciao…
Please note that this article still just in the scope of exploration and functional testing of the GKE Gateway solution. We will discuss about the response time or performance comparison between Service Mesh Istio and GKE Gateway in Part 2, coming soon…
