What is Scalability?
Understanding system scalability and how it is different from system performance?
Overview
Scalability is the ability of a system to perform more or larger units of work — or, in other words, service higher levels of loads - by leveraging additional system resources or units/nodes. Implicit in this definition is that the higher load is serviced without adversely affecting performance.
What the “higher load” refers to depends on the context, but here are some examples:
- A larger number of concurrent requests
- A higher rate of requests (throughput)
- A bigger dataset
- A larger number of application objects
Put another way, a system is scalable if increasing system resources (CPUs, memory, etc.) or units (nodes, machines, etc.) enables it to process higher loads.
Scalability vs. Performance
The terms performance and scalability are sometimes used interchangeably, but they are distinctly different.
- Performance, in the narrow sense of the word, is about the “speed” of execution (such as an HTTP request) in the face of a given load.
- Scalability, on the other hand, is about the system’s ability to maintain its performance with increasing loads (such as concurrent users, operations, data volume, etc.) using more and more resources.
As such, improving a system’s performance may or may not improve its scalability and vice versa.
Note that the term performance is also sometimes used in a broader sense to refer to “more or larger units of work”, as done by Werner Vogels in this article.
References
- Systems Performance: Enterprise and the Cloud, by Brendan Gregg, Published by Prentice Hall, 2013