Scaling the Coherence cluster on OKE using KEDA and Horizontal Pod Autoscaler

Ali Mukadam
Oracle Developers
Published in
7 min readMay 12, 2023

In the previous article in this series, we looked at how to monitor a stretched Coherence cluster deployed in multiple OCI regions. In this article, we’ll use KEDA and Horizontal Pod Autoscaler (HPA) to use the captured metrics to automatially scale the Coherence cluster.

But first, a brief recap as to where we are:

Deployment and monitoring of Coherence in multiple regions

What we want to see is when there’s an increase in load, the entire system is able to respond and scale gracefully. The neat thing is that the Coherence Operator already supports HPA. Now, we want to see the operator use the captured metrics to adjust the size of the if it’s. Since I’ve been meaning to write about KEDA for a while, we’ll use that instead of the Prometheus adapter as in the documentation. The diagram below is what we’ll try to achieve:

Scaling Coherence with Prometheus and KEDA within a Kubernetes cluster

We’ll use JMeter to run a performance test to increase requests to the Coherence cluster. When the test is run, Prometheus will see an increase in some Coherence metrics and we want to see those metrics used by KEDA to trigger an autoscaling event through the HPA. In turn, when the Coherence cluster size has been updated by the HPA, the Coherence Operator will scale the number of StatefulSets and we should see this captured in the metrics by Prometheus and we’ll analyze this in Grafana.

Whilst we are running a global Coherence cluster, the Coherence operator that will handle the scaling runs locally within a cluster. So, we’ll need to repeat this in all the connected clusters:

Global Pod Autoscaling

Installing KEDA

Since we already installed Prometheus before, we only need to install KEDA. Add the helm chart repo and download the values manifest for editing:

helm repo add kedacore https://kedacore.github.io/charts
helm show values kedacore/keda > keda.yaml

Edit the keda.yaml and change the following to allow us to also monitor KEDA:

prometheus:
operator:
enabled: true
serviceMonitor:
enabled: true
podMonitor:
enabled: true

Next, install KEDA:

for cluster in paris amsterdam frankfurt; do
kubectx $cluster
helm install keda --namespace keda kedacore/keda -f keda.yaml --create-namespace
done

Next, we’ll define a KEDA ScaledObject using the Prometheus Scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: scaledcoherence
namespace: coherence-test
spec:
scaleTargetRef:
apiVersion: coherence.oracle.com/v1
kind: Coherence
name: storage-${CLUSTER}
pollingInterval: 30 # Optional. Default: 30 seconds
cooldownPeriod: 300 # Optional. Default: 300 seconds
minReplicaCount: 2 # Optional. Default: 0
maxReplicaCount: 20 # Optional. Default: 100
advanced: # Optional. Section to specify advanced options
restoreToOriginalReplicaCount: true # Optional. Default: false
horizontalPodAutoscalerConfig: # Optional. Section to specify HPA related options
behavior: # Optional. Use to modify HPA's scaling behavior
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 120
scaleUp:
stabilizationWindowSeconds: 180
policies:
- type: Percent
value: 200
periodSeconds: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://kps-kube-prometheus-stack-prometheus.monitoring:9090/
query: sum(vendor:coherence_memory_heap_memory_usage_used{cluster="storage"}) / sum(vendor:coherence_memory_heap_memory_usage_max{cluster="storage"}) * 100
threshold: '10.0'
activationThreshold: '20.0'
namespace: monitoring

You’ll notice we use Coherence’s API version as scaleTargetRef and the kind to be Coherence. We also provide the name of the Coherence cluster in Paris. Here we’ve parameterized it.

Under triggers, we set the type to be prometheus and we use the cluster’s address of Prometheus. We also specify a Prometheus query in PromQL that will return us a value to be evaluated against the activationThreshold and the target threshold values. What you use for you query, threshold and activationThreshold depends on your application, your use case and perhaps non-functional requirements. The values use in this example is meant to illustrate how you would go about doing it. Note that you can also have a list of triggers that will cause a scaling event based on different queries. But for now, let’s get our first query and trigger working.

For testing, we’ll be loading a large amount of data into the Coherence grid and we want to ensure that it has enough memory to store all the data. To achieve this, we use the Prometheus Query to look at the heap usage and calculate a percentage:

sum(vendor:coherence_memory_heap_memory_usage_used{cluster="storage"}) / sum(vendor:coherence_memory_heap_memory_usage_max{cluster="storage"}) * 100

In the above example, if this value reaches 20%, this will trigger a scaling event and more pods will get created to spread the data out. If we start clearing the data, when this value goes down to below 10%, some of the pods will automatically get deleted. Let’s apply this and run the test:

for cluster in paris amsterdam frankfurt; do
kubectx $cluster
CLUSTER=$cluster envsubst < scaledcoherence.yaml | kubectl apply -f -
done

Verify the scalers in each region has been correctly defined and values are appropriate e.g. you would want your threshold to be less than activationThreshold or your minReplicaCount to be less than maxReplicaCount:

for cluster in paris amsterdam frankfurt; do
kubectx $cluster
kubectl -n coherence-test get scaledobject scaledcoherence
done

✔ Switched to context "paris".
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
scaledcoherence coherence.oracle.com/v1.Coherence storage-paris 2 20 prometheus True False False 2m58s
✔ Switched to context "amsterdam".
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
scaledcoherence coherence.oracle.com/v1.Coherence storage-amsterdam 2 20 prometheus True False False 3m2s
✔ Switched to context "frankfurt".
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
scaledcoherence coherence.oracle.com/v1.Coherence storage-frankfurt 2 20 prometheus True False False 3m7s

You can also look at ScaledObject in more detail:

kubectl describe scaledobject scaledcoherence

and you should see something like the following:

....
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: True
Type: Ready
Message: Scaling is not performed because triggers are not active
Reason: ScalerNotActive
Status: False
Type: Active
Message: No fallbacks are active on this scaled object
Reason: NoFallbackFound
Status: False
Type: Fallback
External Metric Names:
s0-prometheus-prometheus
Health:
s0-prometheus-prometheus:
Number Of Failures: 0
Status: Happy
Hpa Name: keda-hpa-coherence-scaledobject
Original Replica Count: 3
Scale Target GVKR:
Group: coherence.oracle.com
Kind: Coherence
Resource: coherence
Version: v1
Scale Target Kind: coherence.oracle.com/v1.Coherence
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScalersStarted 36s keda-operator Started scalers watch
Normal ScaledObjectReady 21s (x2 over 36s) keda-operator ScaledObject is ready for scaling

We are now ready to test the scalability of Coherence within a Kubernetes cluster using metrics.

Triggering the scaling event

To trigger the scaling event, we’ll use JMeter to generate the load:

kubectl apply -f coherence-jmeter.yaml --namespace coherence-test

Let it run for a while and meanwhile, we can observe the proceedings in Grafana:

Observing Coherence Cluster size changes

We can see that once the heap size reaches 20% utilization, the cluster size changes. These events are marked on the graph above. Similarly, we can also look at the overall cluster heap as the cluster size increases and we can see that an addition of a cluster member corresponds to an increase in the overall heap available:

Increase in heap size with addition of new Coherence cluster members

The addition of a member is also followed by a partition transfer as Coherence seeks to distribute the data:

Coherence Partition transfer

On the Coherence Summary dashboard, we can see an increase in the number of Coherence cluster members at various heap utilization:

On the Cache Summary Dashboard, we can see a steady increase in Cache Data memory usage:

Finally, on the Keda dashboard, we can also see the number of HPA replicas created by KEDA:

Scaling down

So, we’ve shown that Coherence can scale out based on the metrics captured by Prometheus and using KEDA. But can the cluster be scaled back in?

Let’s stop the JMeter Put performance test and run another one that will truncate the data instead:

kubectl delete -f coherence-jmeter.yaml
kubectl apply -f coherence-jmeter-truncate.yaml

After stopping the performance test, this is how the cluster looks like before it starts to scale down:

As KEDA starts scaling down the cluster, we see other members departing:

Coherence scaling down

Note that the metrics and Prometheus query used here are to illustrate using KEDA, Prometheus and Coherence together over 3 different regions. You should not use these to determine how to autoscale Coherence when running on Kubernetes. As I mentioned before, your Prometheus query, values etc depend on your use case.

Conclusion

In this article, we show that even though the Coherence members are spread over 3 geographically-separate regions, we are still able to scale based on load using metrics from Prometheus and triggered by KEDA. As the load increases, the Coherence cluster is scaled by adding new members and as the load decreases, the cluster size is correspondingly reduced. We can also choose separate frequencies for scaling out and scaling back.

I would like to conclude here and again thank my colleagues Jonathan Knight, Tim Middleton, Avi Miller, Julian Ortiz and Sherwood Zern for their contributions and ideas to this article.

--

--