Auto-Scaling on Alibaba Cloud
When you deploy your application on compute instances on-premise or in the cloud, you have to make an educated guess about the utilisation of the resources you provision. If you have an older version of the same application running with proper monitoring, you can base your guesstimate on the current usage of compute nodes. But when this is the first time your application goes to production you are out of luck. How many users will you have each day? What will users do when they start your application and how are usage peaks distributed? Do you expect growth in the number of users? How about long term? If you have answers to all of these questions, you might be well-equipped to go with your gut-feeling and just deploy the application on the number of nodes you came up with. Let’s assume you don’t have all the answers though (which is probably the case). This is where auto-scaling comes in.
In this article, I will introduce you to the concepts of auto-scaling and how it can help you deploy highly available applications on your cloud infrastructure. We will also look at Alibaba Cloud and see how auto-scaling is done there. By means of an experiment with Terraform, used to practise infrastructure as code, we will get acquainted with the moving parts in a typical auto-scaling setup.
What is auto-scaling?
Auto-scaling an application means that the nodes that are used to run that application can adapt to changes in load automatically. You can do horizontal scaling by adding new nodes with stateless copies of the application as needed. In most settings you could also scale vertically if you wanted to. Vertical scaling is the process of upgrading a single node with more CPU, memory or disk space. As vertical scaling is not common and often not the right solution to the problem, we will focus the discussion on scaling horizontally. Adding more identical nodes to your cluster is also referred to as scaling out, while removing nodes is called scaling in. Oftentimes the scaled nodes are grouped behind a load balancer, that will make sure all nodes receive an evenly distributed amount of traffic. As load balancing is a whole new topic in itself, I will not focus on load-balancing an application in this post.
The statelessness of the copies of the application on every node is very important. If a node stores state, for example user sessions in web applications, the state is lost when the node goes down or is removed from the auto-scaling group. This could mean that users are suddenly logged out of the application. The same goes for user-uploaded data or any other data that needs to be shared between the nodes. Normally you would use a database for user and session data or store files in network attached storage that is shared between nodes.
So what options are there for automating the scaling operations of your cluster? Most cloud providers have two choices: scheduled or demand-based. Scheduled scaling is done based on a daily, weekly or monthly schedule that defines when to scale out and when to scale in. This is applicable when you know that most users work with your application at a certain time of day. You could also use it to make sure you have enough capacity right before a new marketing campaign is launched. Demand-based scaling looks at metrics provided by the operating system of running nodes and makes decisions based on that. You could scale out when the average CPU usage is above a certain threshold, and scale in again when the CPU usage is low enough. Often you can also define custom metrics to base the scaling activity on.
Before we dive into setting up auto-scaling on Alibaba Cloud, it is important to understand what benefits auto-scaling might give you. Apart from providing the right user experience at the right time by scaling out when demand increases, scaling in appropriately can save you a lot of money on the monthly cloud bill. Although this kind of cost-optimisation is probably not the thing to start with for your new project, it can become very important as applications evolve. In any case, thinking about how to scale your application upfront can really prevent headaches when the time to cut costs arises.
As one of the lesser used cloud providers in Europe and the US, the Chinese Alibaba Cloud might not be your first choice for deploying a new project. But despite its relative anonymity, its feature set is pretty complete and has included all tools necessary to do auto-scaling since 2015. These tools are gathered under the name Auto Scaling in the services overview. With data centers in London, Frankfurt, Virginia and Silicon Valley amongst others, there should always be a region close enough to you so you can test the waters.
Implementing auto-scaling with Terraform
To follow along with the implementation of Auto Scaling in Alibaba Cloud, you will need to register an account here. We will be practicing Infrastructure as Code using Terraform, so you should make sure that you have Terraform installed on your machine. Terraform helps us describe the resources we need in the cloud declaratively in configuration files. It can then apply any needed changes for you, so you don’t have to click through the management console yourself to update your resources. This gives you the valuable ability to plan resources, repeat deployments and destroy complete environments with simple command-line calls. Alibaba Cloud has an official open-source Terraform provider that makes it easy to create these repeatable deployments of resources in your Alibaba account. Auto Scaling is also among the supported services.
The full source code is contained in a directory in this repository. I will walk you through the most important parts of the templates and the deployment so you should have a running auto-scaling experiment in just a few minutes.
Setting up the experiment
To start the experiment, we need to clone the repository, move into the relevant directory and initialize Terraform so the plugins are installed:
You should see a message like
Terraform has been successfully initialized! that confirms that everything works.
If you look in the directory, you will see a bunch of
.tf files and one user data file:
└── vpc.tf0 directories, 5 files
The moving parts involved in auto-scaling are all in the
template.tf file. We will focus our discussion on that file.
Scaling group & scaling configuration
template.tf file contains the following Terraform resources at the top:
A scaling group is the wrapper around all auto-scaling resources. It defines a minimum and maximum number of Elastic Compute Service (ECS) instances that should be in the group. If also defines which instances should be removed first in the removal policy. We also reference two VSwitches where our instances will be placed. A VSwitch is a virtual switch in an Alibaba Cloud VPC that defines a single subnet. We want to be highly available so we created two VSwitches (subnets), each in a separate availability zone. If you point to these in the scaling group then auto-scaling will make sure that instances are balanced across the specified VSwitches. The VPC and VSwitches are defined in
vpc.tf , should you be curious about their definition. Also note the
cooldown property, which defines how long auto-scaling should wait before triggering the next auto-scaling event. This makes sure that the scaling group will not go crazy on a short spike of traffic but the scaling is spread out more evenly.
The scaling configuration defined below the scaling group defines the template that is used to launch new instances. If you look closely, you can see that we reference the Ubuntu 18.04 image, so our instances will run that Linux distribution. The instance type is a small
t5.nano to make sure our bill stays low for this experiment. Most other properties speak for themselves. Another important part is the
user_data property, that points to a configuration file that is run on the machine during its boot process. If you look in the file
user-data.conf you can see we install NGINX, fire it up and then start a stress test on the machine to generate some load:
The CPU load eats up a whole core for ten minutes, which will come in handy for the rest of the experiment.
Scaling rules & alarms
We have now defined a scaling group and a scaling configuration, but have not defined how to scale our instances. Those definitions are called scaling rules and you can see two of them in
The definition is straightforward. We create a rule called
add-instance and another one
remove-instance . These add and remove one instance respectively.
In itself, a scaling rule does nothing. It only gives a name to a certain type of scaling activity with a set amount. To actually perform the scaling, a scaling rule has to be triggered. We spoke earlier of the way auto-scaling can be performed in most public clouds, and this is where Alibaba Cloud is no different. You can trigger a scaling rule manually, on a schedule or after a customisable alarm goes off. We will choose the last option for this experiment:
In the last part of our resource definitions, we define the alarms that will trigger our scaling rules. We have one alarm that adds instances when the average CPU usage of our instances goes above 70% in a 60-second period and that value is found two times in a row. Every metric is evaluated every minute, so if we have two consecutive minutes of 70%+ average CPU usage the alarm will trigger. It will then add an instance (see the
As you might recall from the scaling configuration user data, we stress the single-core CPU for a full 10 minutes after it booted. The scaling group starts with 1 instance and goes up to a maximum of 3. So, when we start the experiment, a single instance will be booted. But after two minutes, a second instance will be started as the average CPU is above 70%. Then, again after two minutes, the third will be launched. A scaling group never goes above its maximum number of instances, so the experiment is capped at 3 instances.
After some time, the stress on the CPU will stop. Our average will drop again. If you look in the second alarm defined above, you can see that we will start removing instances as soon as the average CPU usage will go below 10% two times in a row. So, the alarm will trigger and start removing the oldest instance. After some time, it will also remove the second instance and leave us with the instance that was started up last. The scaling group will also never go below its minimum number of instances, so we will keep running a single instance. This concludes the experiment.
Run the experiment
Now that we have an expectation of what will happen with our resources, let’s start the experiment:
export ALICLOUD_REGION="eu-central-1"terraform apply
First we set environment variables with the access key, secret access key and region. The keys can be found in your account. I’ve set the region to
eu-central-1 which is Frankfurt. Then use
terraform apply to create an execution plan for the changes. Type
yes to approve the changes and create the resources in your account.
You can go to the ECS and Auto-Scaling consoles to see the running experiment. It will take a few minutes before the first instance is booted and the second instance will be added. There are several buttons to see the monitoring for the scaling group, the instances and the alarms we set up. There’s also a Terraform query in place to get the running instances in the scaling group:
terraform refresh -target="data.alicloud_instances.scaled-instances"
This will output the currently running ECS instances in our scaling group. If you look up the public IP addresses of the instances in the output and browse to them, you should see the default NGINX landing page.
This concludes the auto-scaling experiment on Alibaba Cloud. You should now have a good idea of why we need auto-scaling for making sure our application is highly available and our costs are kept to a minimum. You’ve also created a scaling group, scaling configuration, scaling rules and alarms in Alibaba Cloud using Terraform. The next step is to put the instances behind a load-balancer. This will create a single entry point for your application and distribute all the traffic among your instances. The documentation for Server Load Balancer (SLB) resources should point you in the right direction!