Autoscaling Presto in AWS Elastic Container Service

Yegor Pavlikov
5 min readAug 16, 2020

Tempted by improving Hive queries performance, our team decided to try a well known in-memory big data engine Presto. This article is dedicated to building and monitoring Presto cluster with autoscaling in the AWS environment. Considered the limited terms our team had, we’ve chosen Elastic Container Service (ECS) component, as it already had all the features needed for the fast ramp-up. It allows us to configure infrastructure with ease, it has the UI to make quick changes and spot-on monitoring. Also, important thing is the ability to make blue-green releases through CodeDeploy or any available build service like Jenkins or Gitlab. The footprint of ECS is having a small agent installed on every container node to control docker deployments and monitoring.

Initially, it was tempting to use spot EC2 instances. But as we saw later, this would involve too much instability, which Presto could not handle well. Well, strictly speaking, a 2-minute notice that AWS gives before taking away your spot instance is not enough to gracefully shut down all worker tasks. At this point, our team planned to use Presto to handle our ad-hoc workload so it rather supposed to be an interactive tool. And the average time for query execution, of course, is much more than 2 minutes.

Highly available Presto cluster may have two coordinator nodes, where one is always active, another is a standby. For my purposes, having one instance node was just enough. To handle single coordinator node outage, an Auto-Scaling Group (ASG) was created with just one instance in it. Using a simple health check and launch templates, it takes a few minutes to replace dead instance.

Another ASG is serving presto worker type instances. Usually, workers do all the heavy lifting in the presto world, while the coordinator performs only dispatching and orchestration. So for the worker ASG, another launch template is needed with different instance types and the ability to scale in and out. Our cluster supposed to be interactive and most of the ad-hoc queries are executed during the day. The higher daylight threshold for autoscaling can be lowered at night. For this purpose, we can utilize Scheduled Actions and make adjustments to the minimal, desired, and maximum number of instances. At night the cluster still needs to handle some minimal capacity in case someone in our team had insomnia.

So, for example, A1, B1, and C1 are minimum, desired, and a maximum number of workers for the day shift. Then A2, B2, C2 are used for the same purpose at night. The following equations are true for the initial setup:

B1=A1
B2=C2
A1=C2

Here A1=C2 to level the threshold when the switch between the “day” and “night“ shift happens at 7 AM and 7 PM accordingly. If C1=100 and we are currently shifting to the evening period, we won’t be automatically killing a bulk of instances at once. In the picture below, the dashed line shows the autoscaling maximum and the dotted line displays the minimum.

Scheduled Action and Austoscaling at work. The orange line is CPU usage.

ECS cluster is configured through a task definition, service, and cluster itself. Task definition is a key entity to configure your docker container.

To run Presto we need a handful of services: Hive metastore, datadog for monitoring (optional), coordinator, and worker service. An external Hive engine is a shared service so we did not have to include it in our ECS cluster.

For the autoscaling, I experimented with ECS’ native approach involving the capacity provider feature but it did not work well for me. Instead, I created a CPU based scaling policy for the worker ASG with the target value of 75. As an alternative, a memory-based policy was tested but rejected later because Presto workers did not release memory as quickly as needed.

Since we use auto-scaling, we need to take care of tasks currently executed on a node, before shutting it down. Presto provides API for that:

http://nodename:8080/v1/info/state

We just need to post SHUTTING_DOWN signal to the above URI and presto will not accept any new tasks going forward, just finish the current ones. Lamba function used for the worker ASG instance shutdown hook is a slightly modified version from this article.

Since we did not want people to directly access our Presto cluster and be able to control data level permissions with the Hive engine, a specific security group is created with just a few UI applications in it. Also by putting an application load balancer in front of the Coordinator, we mask the API path for all external calls. So in my case, the front-end acts as the authentication layer for Presto.

The launch template for the worker is pretty straightforward. Some interest represents userdata part where we define affinity for the node, marking it for a particular ECS cluster.

The worker task definition includes a volume that can be used to spill presto task memory to drive. Some basic docker image parameters like port number are also set here. The last line defines task family so when we update task definition later, a new review of the task will be created.

Also, some interest may represent the way the latest ECS optimized image is pulled based on Amazon Linux 2.

The further improvement would be using spot instances with defined duration for short autoscaling periods. This would involve a monitoring lambda that would handle spot instance for me and take away machines which spot duration is over.

--

--