Scaling in Cellery — How and Which?

Isuru Haththotuwa
wso2-cellery
Published in
7 min readOct 31, 2019
Image credits - https://www.flaticon.com/authors/pixel-perfect

TL;DR

Unlike VM based monoliths, container-based microservices can be scaled up/down when necessary. The ability to scale mainly comes from the microservices architecture; the smaller footprint, lesser startup time are two of the main contributors.

Cellery supports multiple scaling techniques as well as the ability to override them in the runtime. This article briefly discusses how the scaling techniques work in Cellery and how to choose the appropriate one.

Why Scaling?

Scaling is not a concept specific to microservices and containers; even during the era of virtual machines, the scaling concept was not uncommon. However, it was more of a manual operation performed by a system administrator. And more often than not required a restart of running hardware to properly connect to the newly added ones.

With the advent of microservices and containers, scaling techniques are used widely than ever. The primary reason for this is that microservices are lightweight by nature and are more scalable. What was previously a coarse-grained piece of software is transformed into a fine-grained and a modular set of components, which makes it possible to easily scale only what is necessary.

So why actually scaling is required?

Imagine a scenario like this: A particular system has a peak load of twice as normal during one seasonal week of the year. If this load is catered with a statically setup system, during all other weeks of the year, a significant amount of resources will be idling. Dynamic scaling comes in to picture in such scenarios; once the need arises, the relevant hardware and software components can be spawned to fulfill the demand, which again can be de-allocated once the peak is dealt with. This approach results in reduced costs, minimum wastage of computing resources, and lesser maintenance overhead.

Different Scaling Methods

Scaling can be traditionally categorized as horizontal and vertical — horizontal meaning adding more computing units to run the workloads (such as VM nodes, containers, etc.) whereas vertical meaning adding more resources (such as memory, CPU, etc.) without increasing the computing units.

Horizontal Scaling
Vertical Scaling

However, in the context of this article, I will be referring to horizontal scaling only.

Manual Scaling

The most primitive way of scaling a system is to do it manually. This involves a system administrator manually increasing the number of nodes/pods etc. to match the demand. In addition to increasing this number, the administrator would need to perform some tasks to make sure that the new instance has properly initialized and joined the existing cluster, etc.

Auto Scaling

Automatic scaling involves scaling a system up and down automatically, considering a set of defined factors to take the decision. However, this needs a separate ‘brain’ to calculate and take the scaling decision. For an example, Kubernetes uses its in-built horizontal pod autoscaler (HPA) to scale based on a set of factors. Similarly, Amazon uses its own custom scaling logic to automatically scale containers as well as VM instances. A base count should be provided, which defines the initial number of instances, and scaling happens from thereon. In scaling down, it will always stop at the base count, even if the system is fully idle.

Zero Scaling

Zero scaling is the new cool trend in the town. The difference between zero scaling and autoscaling is that, with zero scaling you start with zero instances initially. When the first request hits the controlling agent which is intercepting traffic, it will spin up the relevant instances. Therefore, this is only suitable for lightweight microservices and container based deployments. Actually, zero scaling is one paradigm of Serverless architecture. Knative provides support for running workloads in zero scaled manner.

How to choose the correct scaling method?

The best scaling method heavily depends on your use case and requirement — does your deployment consist of modular and lightweight microservices? Can you afford maintaining the management plane required to handle automatic scaling? What is the fluctuation pattern of demand/resources required for your workloads?

Let's take a few sample scenarios.

If the load/requirements change predictably and very rarely, you probably can live with manual scaling at least for the moment. If you can use automatic scaling, that is a bonus.

If there are a few components which rarely get any traffic, but cannot be decommissioned, then you can probably opt for zero scaling. And if the demand is very much sporadic and there is no specific pattern involved, the preference would be for autoscaling. But note that both of these methods incur the overhead of managing the control plane for taking scaling decisions, for an example the K8s HPA. In addition, for zero scaling, the component startup should happen fast enough so that the initial request does not time out.

Scaling with Cellery

What is Cellery and Cell Architecture?

Cellery is an approach to build complex and composite applications in a code-first manner in Kubernetes. Cellery defines an architectural pattern which can consist of either a simple composite or also an opinionated ‘Cell’ with a boundary and a single access point. For more details refer to the Cellery documentation.

Cellery supports manual, automatic as well as zero scaling.

A Cell is a definition written by a developer, using integration specialized Ballerina programming language. It wraps one or more components that are packaged as docker images. The cell developer would know whether a particular component can be scaled or not, hence has the option of providing a scaling policy at Cell development time.

Consider the code snippet below:

Cellery component definition with scale policy

This is a Component definition in Cellery, for which an autoscaling policy is engaged. Minimum and maximum number of replicas, scaling factors, which is CPU usage here, and the threshold value for scaling decisions are given. This is not a full sample, head over to github and check out Cellery documentation for detailed information on how to define scale policies in Cell development time.

When a Cell is deployed in the Cellery runtime, there might be a requirement to update this scale policy. For an example, thresholds for scaling factors such as CPU and memory usage might need to be modified. Therefore, the Cellery CLI provides the option of exporting a scaling policy, modifying it and re-applying. The scaling policy is exported in YAML format to be more ops-friendly.

As a sample scenario, consider a Cell instance named `mytestcell`is deployed with a developer-defined scale policy for one component. We can use the export-policy command in Cellery CLI to export this scale policy:

$> cellery export-policy autoscale cell mytestcell

This will export any existing scale policies to a file named `cell-mytestcell-autoscalepolicy.yaml`. For the complete CLI usage reference, please see CLI documentation.

The content of the exported file are as follows:

components:
- name: controller
scalingPolicy:
overridable: true
hpa:
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
averageUtilization: 40
type: Utilization
- name: catalog
- name: orders
- name: customers
gateway:
scalingPolicy:
replicas: 1

Let's go through the contents of this file. There are four components and one gateway listed. The component named `controller` has an autoscaling policy engaged, where the relevant scaling parameters are defined inline. Under the hood, this uses K8s HPA. Other components do not have any scale policies defined.

To update the existing autoscaling policy for the component `controller`, simply edit this file and use the following CLI command:

$> cellery apply-policy autoscale cell mytestcell cell-mytestcell-autoscalepolicy.yaml

If you need to change the scale policy from HPA to a zero scale policy, edit the same file as follows, and re-apply using the command `cellery apply-policy` shown above.

components:
- name: controller
scalingPolicy:
kpa:
maxReplicas: 3
concurrency: 2
- name: catalog
- name: orders
- name: customers
gateway:
scalingPolicy:
replicas: 1

Now, the controller’s scale policy has been changed to KPA. And the minimum number of replicas are defaulted to zero. Hence the pod corresponding to the `controller`component will be terminated if there are no requests directed to it.

Now for manual scaling; note that none of the other components or the gateway has a scale policy defined. However, we can still use manual scaling - just define the number of replicas under any component and/or gateway, and re-apply the changes with the CLI:

components:
- name: controller
scalingPolicy:
kpa:
maxReplicas: 3
concurrency: 2
- name: catalog
- name: orders
scalingPolicy:
replicas: 2
- name: customers
gateway:
scalingPolicy:
replicas: 4

In the above policy, the gateway has been manually scaled to 4 replicas, and the `order` component to 2. Now, check the pod counts for the gateway and the order component. Voila! there will be four and two pods running for each respectively.

Conclusion

In this article, we looked at different scaling methods as well as how to choose the correct scaling method as required. In the latter part, we looked how these different scale mechanisms work in Cellery and how we can dynamically update and switch between different scaling methods using the Cellery CLI.

That’s all folks. Happy scaling!

--

--