How to autoscale MuleSoft APIs deployed on Runtime Fabric(RTF)

Published in

Another Integration Blog

5 min readJun 19, 2023

MuleSoft, with its Anypoint Platform, is a market leader in API management and integration. Flexibility in deploying and scaling APIs across various environments is one of the primary advantages of MuleSoft’s approach. Runtime Fabric (RTF) is MuleSoft’s container service for multi-cloud deployment, enabling users to deploy Mule applications and APIs in a secure, scalable, and efficient way.

In this article, we will discuss scaling MuleSoft APIs deployed on Runtime Fabric. It is assumed that you have a basic knowledge of MuleSoft, the Anypoint Platform, Runtime Fabric and Kubernetes for the purposes of this blog.

Understanding MuleSoft Runtime Fabric and Kubernetes

It is important to understand what MuleSoft Runtime Fabric (RTF) and Kubernetes are, before we get into the scaling process.

MuleSoft Runtime Fabric is a container service for Mule applications and API deployment across multiple clouds. It abstracts away the majority of the complexities of administering a Kubernetes cluster, allowing developers to concentrate on creating and deploying applications. However, when scaling, native solutions may not be sufficient. Kubernetes comes into play with its robust scaling capabilities, particularly Kubernetes autoscaling.

Kubernetes, on the other hand, is an open-source platform for automating containerized application deployment, scaling, and administration. It provides a framework for running resilient distributed systems by handling scaling and failover for your applications and providing deployment patterns, among other things.

Understanding auto-scaling in MuleSoft RunTime Fabric

Runtime Fabric’s auto-scaling concept must be understood before we delve into details. Auto-scaling is a method of automatically adjusting the number of instances of an application according to its demand. The Runtime Fabric, which uses Kubernetes as its orchestration platform, supports this feature. Auto-scaling ensures that the system scales in (adds more instances) or scales out (removes instances) in response to load.

This is how Kubernetes autoscaling works

Kubernetes autoscaling can be implemented in two ways: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).

HPA scales pods within a deployment . It’s implemented as a control loop, with a target defined by CPU utilization (although custom metrics can also be used).

With VPA, users no longer need to set resource requirements and limits on containers within a pod. VPA Sets resource requirement values for containers in a pod based on historical resource usage analysis.

This blog will focus on HPA as it is commonly used for autoscaling applications in Kubernetes.

Steps to Scale MuleSoft APIs Using Kubernetes Autoscaling

Now, let’s get into the actual process of scaling MuleSoft APIs using Kubernetes Autoscaling.

Step 1: Ensure Prerequisites

Before starting, ensure that you have:

An active MuleSoft Runtime Fabric cluster
A Kubernetes cluster is up and running.
Access to the Kubernetes command-line tool, kubectl

Step 2: Deploy Your MuleSoft API

Deploy your API onto the MuleSoft Runtime Fabric, specifying #replicas, CPU and memory reserved, and limits.

Step 3: Implementing HPA

First, you need to create an HPA resource in Kubernetes. The HPA will scale the number of pod replicas for our MuleSoft deployment.

Pod autoscaling using the Kubectl autoscale command

Kubernetes provides a command-line tool to interact with the control plane of a Kubernetes cluster using the Kubernetes API. This tool is called Kubectl.

The kubectl autoscale command is one of the easiest ways to setup autoscaling for your deployments.

Let’s take an example of a MuleSoft API deployed on a Kubernetes pod named ‘MuleSoft-api’. We want to autoscale this deployment based on the CPU utilization of the pod. Here’s how you can achieve it:

Kubectl autoscale deployment mulesoft-api --cpu-percent=60 --min=2 --max=5

In this Kubectl command:

MuleSoft-api is the name of the MuleSoft deployment that you want to scale.
--cpu-percent=60 means that Kubernetes will add new replicas when the average CPU utilization of all pods in the deployment goes above 60%.
min=2 and max=5. Specifies the minimum and maximum number of replicas that Kubernetes should manage for this deployment.

Autoscaling using the Horizontal Pod Autoscaler (HPA) manifest file

The kubectl autoscale command is easy to use, but not as flexible as defining a horizontal Pod Autoscaler (HPA) using a manifest file. HPA manifest files allow you to define multiple metrics (not just CPU) and use custom metrics.

Here’s an example manifest file for the same mulesoft-api deployment:

Create manifest file mulesoft-api-hpa-manifest.yaml

#mulesoft-api-hpa-manifest.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mulesoft-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mulesoft-api
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

2. Apply the autoscaler manifest file using the kubectl apply command.

kubectl apply -f mulesoft-api-hpa-manifest.yaml

In this manifest file:

scaleTargetRef refers to the MuleSoft deployment that this HPA will manage.
minReplicas and maxReplicas are the same as in the kubectl autoscale command.
metrics specifies metrics that the HPA will use to determine when to scale. In this case, we’re still using CPU utilization.

Monitoring Autoscaling

Once autoscaling is configured autoscaling, you can monitor it using the following command:

kubectl get hpa # Get all the HPAs
kubectl get pods #Get all the pods for the deployments

This gives you an overview of all HPAs, including current and target CPU utilization and the current and desired number of replicas.

After your Mule application has been deployed, you can use the Runtime Manager to monitor the scaling events of your application as well as its overall performance. In addition to this, Runtime Fabric includes built-in log aggregation and forwarders to external systems for additional monitoring and analysis. Examples of such external systems include Splunk and ELK.

Conclusion

MuleSoft’s Runtime Fabric enables you to grow your APIs in a secure and efficient manner. You can ensure that your APIs can manage fluctuating load patterns and maintain consistent performance by correctly configuring your deployment and enabling auto-scaling. This can lead to cost savings, increased customer happiness, and increased productivity for your teams. Remember that selecting auto-scaling parameters will be heavily influenced by your individual use cases, and you may need to fine-tune them over time. Good luck with your auto scaling!

References

Anypoint Runtime Fabric Overview

Anypoint Runtime Fabric enables you to deploy Mule applications and API proxies to a Kubernetes cluster that you…

docs.mulesoft.com

Horizontal Pod Autoscaling

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or…

kubernetes.io