Scale down to zero with Cellery

Mirage Abeysekara
wso2-cellery
Published in
3 min readAug 15, 2019

Cellery is an application orchestration runtime for Kubernetes which supports organizing and managing the application deployment via code. The release Cellery v0.3 includes request-driven scaling and scale-to-zero support for your applications. This was achieved by integrating Knative-Serving with the Cellery.

When you should consider using scale-to-zero?

The idea of the scale-to-zero is that a container/pod can be reduced down to zero replicas when idle and brought back up if there is a request to serve. As the container/pod is not running 24/7, the resource consumption is very minimal compared to always running services. This leads to less power consumption for computing and eventually save some cost you expenses on power or to the cloud provider if you have a large number of nodes.

The scale-to-zero behavior is really useful for microservices like registration processing services which has a large idle period compared to other services. However, if a request comes to an idle service, there can be client time out or server errors like HTTP 500 due to long service startup times. Therefore, you have to write your microservices considering the startup time and correctly decide what microservices are going to be scale-to-zero to get the most out your resources.

How scale-to-zero works in Cellery?

As mentioned, Cellery uses Knative-Serving with some modifications to provide scale-to-zero support. This means the scaling is done based on the number of concurrent requests.

If you install the standard Knative-Serving with Istio to your Kubernetes cluster, Knative will use the Istio ingress gateway to accept external traffic to zero scaled pods (of course you need someone to listen at the edge in order to accept the request and start the flow). For the internal traffic, Knative uses Istio’s special gateway called mesh which can be used for listing all the traffic inside the mesh. If either of this gateways received a request to a host which has scale-to-zero enabled, Knative will create the corresponding pod mapped to that host and send the request (This is done using a Knative system component called activator.)

In Cellery, we build up our applications using Cells which mainly has a cell gateway and isolated set of components (pods) which can be reachable only from the cell gateway. The following figure shows the modification done to the Knative-Serving traffic flow in order to work zero scalings with cells,

Knative traffic flow inside a Cell

The cell gateway is responsible for accepting external traffic (here, the external traffic refers to traffic coming from outside the cell which can be another cell) to zero scaled components instead of accepting traffic from global Istio ingress gateway. Further, the mesh gateway is applied in a way that it only listens to the traffic inside the cell in order to scale up zero-scaled components if required. This means the zero-scaling inside a cell is transparent from other cells and future you can remove the zero-scaling from a component without worrying about the external consumers.

Enabling Scale-to-zero in Cellery system

To enable the Scale-to-zero, run cellery setup and choose Modify -> Autoscaler -> Scale-to-Zero -> Enable

$ cellery setup
✔ Modify
✔ Autoscaler
[Use arrow keys]
? Select system components to modify
➤ Scale-to-Zero
Horizontal Pod Autoscaler
BACK

You can verify whether the scale-to-zero is enabled by running

$ cellery setup status

Which should show the Scale to zero component as enabled like following,

cluster name: cellery-admin@cellery      SYSTEM COMPONENT         STATUS
---------------------------- ----------
ApiManager Enabled
Observability Enabled
Scale to zero Enabled
Horizontal pod auto scalar Disabled

Writing a scale-to-zero component

Once you enable scale-to-zero component, you can specify the scaling policy in the component like following,

scalingPolicy: <cellery:ZeroScalingPolicy> { 
maxReplicas: 10,
concurrencyTarget: 50
}

Here, the cellery:ZeroScalingPolicy specifies the component to be zero scaled and it has two configuration fields,

  • maxReplicas: Defines the maximum number of replicas when scaling up the component.
  • concurrencyTarget: This specifies the number of concurrent requests that one replica can handle. If the number of concurrent requests exceeds this threshold, the component will scale up to match the number of concurrent requests. In the above example, if the component receives 120 concurrent requests, the component will scale up to three replicas.

For more step by step guide, you can try out the zero scaling samples.

--

--