The wicked problem of scalability in Cloud computing and How to defeat?

Dinh-Cuong DUONG
Problem Solving Blog
4 min readAug 28, 2020

Scalability and elasticity are one of five essential properties in Cloud Computing technology. According to The National Institute of Standards and Technology (NIST) defines cloud computing as it is known today through five particular characteristics: On-demand self-service, broad network access, multi-tenancy and resource pooling, rapid elasticity and scalability, measured service.

What is Scalability?

is the capacity of an object to be changed in size or capability. A system is scalable if its resource is sizable or it can be stretch out by adding more elements to increase the total capacity.

Figure — 1. The two-dimension of Scalability.

In Cloud Computing, everything is a virtualized resource. A virtualized computing resource has three essential components: CPU, Memory, Disk, and Network to be able to run the software. It is scalable if its CPU units, Memory, and Disk capacity can be sizable (a network resource is not a factor of scalability due to it depends on the infrastructure of a data center) — it’s called scaling-up. Scaling-up having a limitation of the physical host machine. If a host machine has 8 CPU units and 64 GB of RAM, a VM inside this host cannot size over those numbers with the real power of computing resources.

To expand that limitation, Cloud Computing technology provides the elasticity characteristic that allows joining more physical servers into an environment without disruption of its operations.

Therefore, a set of VMs can live in one or more physical servers with the same characteristics. By grouping all VMs with the same characteristics, it can be seen as a single computing power if only if the software running on those VMs having a distributed processing capability.

Virtualization and Automation Software are the primary principles that made of Cloud Computing elasticity and scalability property. Scaling-up and scaling-out are two possible options for scalability.

The needs for scalability:

Undetermined and variety of usage leads to the size of a server is hard to optimized constantly over time. Most of the time a fixed size server is under usage. But the most important time the business is generating revenue, that server is overloaded. You change into a bigger server to satisfy the pick time so that your business doesn’t stop. This is an old story.

We needs to eliminate the waste of idle running servers for other usage while guarantees the readiness of our critical business run.

In three words: waste of resources is the biggest problem before Cloud Computing technology has come. About 95% of computing servers were running in idle mode. “Idle mode” means running for nothing.

What is the wicked problem of scalability?

To understand the hardest problem when designing a scalable system, we take a user scenario of a typical e-commerce website:

An “hiver soldes” this year, a new brand company runs a campaign with a 99% discount of any product who purchases within 9 AM — 10 AM on Tuesday 8th January. The Digital Marketing team estimated there could be over 10 million customers visiting our website if the campaign is succeeded or less than 100,000 customers in a worst-case.

What would be the best strategy in this situation? We set up a cluster of machines that can serve 10,000 customers from 8 AM and the system will scale-out to 100,000 customers at 8h45 AM. From 8h45 AM until 10h15 AM, the system will run in AutoScaling mode to ensure if there will be up to 10 million customers or just remaining to 100,000 customers.

Figure — 2. The wicked problem of scalability.

At pick-time, if you reserve for 100,000 customers, your system will be deadly dead in the case of 2 million customers. Typically, in AWS, an EC2 needs 5 minutes to bootstrap an instance plus your software stack initialization time. During this time, your current server fed all traffic by 2 million customers.

An assumption that you configure the autoscaling to scale for every 2M users per each threshold. During the first 5 minutes, the first batch of 2M users is waiting for the first sizing. When the first sizing isn’t finished, the second batch of 2M users has come. In total now there are 4M users waiting for the first sizing up and kill it!!!

With a long bootstrap start-up at pick-time, or you reserve as large as the size of a system you planned to have or your system will never go up again until there will be no traffic.

How to defeat this wicked problem?

If you can reduce the bootstrap time of each server by divided a big one into many smaller pieces of that big one. Your system can be sized more smoothly and responsiveness. To do so, you need another software design concept called micro-services or nano-services.

Figure — 3. Divided to be faster sized.

Containerization and Function as a Service are two solutions today which have a better scalability characteristic than a cluster of machines. It means you will have to deal with another wicked problem. One of them is the Cold Start.

I also wrote a solution to the Cold Start problem in this article that brings your system seamless scalable and responsiveness:

Follows my articles
The On-Demand Wakeup Pattern to Overcome AWS Lambda Cold Start

--

--

Dinh-Cuong DUONG
Problem Solving Blog

(MSc) Cloud Security | Innovator | Creator | FinTech CTO | Senior Architect.