Seamlessly Shutting Down Asynchronous Components With Amazon Autoscaling

Published in

Cloudinary Engineering Blog

3 min readDec 4, 2018

By: Michael Greenshtein

Here at Cloudinary, we enjoy working with managed services, especially the capability of scaling them automatically. We scale servers based on metrics like CPU and bandwidth. We’ve also built asynchronous components that work with queues, which enables the scaling of our servers on the basis of queue size.

However, even though scaling UP based on queue size is a viable option, scaling down is problematic. That’s because the autoscaling mechanism cannot tell if the server or app has completed processing the job from the queue, which could lead to the mechanism prematurely terminating the instance that’s working on job.

In addition, our cluster works with multiple queues. Each queue can scale up the cluster but we cannot scale down the cluster even if only one of several queues contains no messages. Complicating the situation is the fact that, currently, Amazon offers no compound Cloudwatch Alarms with which we could configure sophisticated alarms, such as the following:

if queue1.isEmpty() AND queue2.isEmpty(); than scaleDown()

The Solution

To address those issues, our solution comprises three main parts with which we can control the process of terminating instances.

We base our scale-down policy on the size of the CPU’s utilization of a cluster, not on the queue size. The flexibility that results means that we can set multiple queues to scale up our cluster with a single generic metric of scaling it down.
However, it’s not enough to just ensure that no instances are terminated while we’re processing messages from the queue. Part 2 below comes into play.
With the help of Autoscaling Lifecycle hooks, instead of the instance being terminated in the middle of an autoscaling-down event ,a hook “marks” the instance for future termination with a timeout of X minutes. If the timeout period has ended but no action has occurred, the instance will proceed to termination. While termination is taking place, the instance continues to work normally with the status message showing, “Terminating: Wait.”
This part searches for and locates the terminating instance, ensures that it finishes all the jobs that are in progress and proceeds to termination. Toward that end, a daemon runs on every instance and checks its own status. Alternatively, you can create a periodic Jenkins job that performs the same steps.

See the code segment below for details:

Summary of the Process

In brief, the process contains six steps:

Cloudwatch triggers Autoscaling after receiving an alert of low CPU Utilization.
Autoscaling marks the instance, displaying the status message, “Terminating: Wait”
The instance’s local daemon notes its updated status and signals the app to shut down.
The app completes all the running processes and shuts down.
The daemon ensures that the app process has stopped and is proceeding to termination.
The Autoscaling group receives the “termination-complete-action” message and terminates the instance immediately.

As DevOps, we must be ready to handle all error situations that might arise. The process described above ensures that even if your app somehow becomes stuck, the Autoscaling lifecycle hook times out by default and eventually terminates your instance.

Any questions, suggestions, or comments? Please describe them below. We always look forward to your take.

Interested in joining the Cloudinary team? Check out our job openings.

Seamlessly Shutting Down Asynchronous Components With Amazon Autoscaling

The Solution

Summary of the Process

Written by Cloudinary