Scalability: key concepts and principles for a System design interview

5 min readMar 11, 2023

Scalability is one of the most important features to consider in System Design. In general terms, it is the ability of a system to handle an increase in demand efficiently, without compromising performance or availability. These are the main points to consider when designing scalable systems.

Horizontal architecture
Also known as “scale out”, is a very common approach that consists of adding more nodes to the system, instead of increasing the capacity of each individual node. For example: if you need to deal with user concurrency in an application, you don’t need a mega-robust machine to handle the load, you can just add more instances, clones of the same service server side by side. This allows the system to handle increasing demand, distributing the load to more nodes. You can distribute these requests with a load balancer.

Vertical architecture
In this model, we add more resources to a single machine like CPU, and memory, or improve disk performance. This type of architecture is expensive and a bit dangerous due to the simple fact that it is not fault-tolerant, because if the server becomes unavailable, the entire system will be compromised and down.

Partitioning
This is another common technique for scalability. Basically, it consists of dividing the system into smaller, manageable parts. Some examples include database partitioning or splitting an application into microservices. But we can separate this concept into several examples, such as:

Function partitioning: in this case, partitions are created based on system functionalities. A good example would be a social network application where you can have a news feed, timeline, chat, etc. Each function can be managed by a separate team, which helps increase scalability and maintenance efficiency.
Object partitioning: This approach involves dividing into partitions based on system objects, such as products, categories, etc. Each partition is managed by a separate server, an example would be a database system, which can be separated by tables, indexes, log files, etc.
Geographic region partitioning: In this type of partitioning, data is divided into partitions based on user’s geographical location, allowing servers closer to users to handle their requests more efficiently. In addition, it helps increase availability because users will always have access to the system, even if one of the servers is interrupted. As an example, we have transportation applications like Uber, to ensure that ride requests are processed quickly and efficiently. For example, the application can have servers in New York, London, and São Paulo, each responsible for serving users in their respective regions.

Redundancy
Redundancy is a common strategy used to increase scalability and availability of systems because it allows the system to continue to function even if parts fail. For example, you can have:

Node redundancy: where multiple nodes perform the same task, so if one node fails, another can replace it without affecting the system.
Data redundancy: copies of data are maintained in multiple locations to ensure that if a system fails, the data will still be available elsewhere.

Cache
I think cache could have a separate article because we can talk about a lot of things, but basically, cache is an important technique that involves storing frequent data in fast-access locations, such as memory, instead of retrieving it from an external source every time it is requested. This helps reduce overhead in a system with high demand.

See more about cache in this article. — soon

CAP Theorem
The CAP Theorem is an important theory in the design of scalable systems. It states that it is impossible for a distributed system to simultaneously meet the three criteria of Consistency, Availability, and Latency consistently. Therefore, it is important to choose which criteria are most important for the system and design it accordingly.

Availability
Availability is the measure of a system’s ability to be available to fulfill user requests. It is a critical part of scalability, as it allows the system to meet growing demand without compromising availability.

Load balancing
Load balancing is an important technique for ensuring scalability, as it involves distributing the load among multiple nodes, allowing the system to handle increasing demand without compromising performance.

Latency
Latency is the measure of the time it takes for a request to be processed and a response to be returned. It is important for scalability, as an increase in latency can negatively impact the user experience.

Throughput
Throughput is the measure of a system’s ability to process requests in a given period of time. It is important for scalability, as an increase in demand may require a greater processing capacity to efficiently fulfill requests.

When scaling a system, we must take into account some trade-offs:

Consistency vs Availability: Choosing between ensuring data consistency or system availability is a common trade-off. For example, a system that uses data replication can ensure consistency but may compromise availability if a replica fails. On the other hand, a system that uses fault tolerance can ensure availability but may compromise consistency if replicas become outdated.

Latency vs Scalability: Choosing between reducing latency and increasing scalability is also a trade-off. For example, using caching can reduce latency, but may hinder scalability if the capacity of the cache is exceeded.

Complexity vs Scalability: Adopting complex solutions to ensure scalability can be counterproductive, as complexity can hinder maintainability and long-term scalability. For example, using a complex messaging system can ensure scalability, but may increase the complexity of the system and make maintenance difficult.

Cost vs Scalability: Adopting expensive solutions to ensure scalability can be counterproductive if the cost exceeds the benefit. For example, using high-performance dedicated servers can ensure scalability, but may be expensive if demand does not justify the cost.

In summary, scalability is a critical issue when designing systems, and a combination of horizontal architecture, partitioning, redundancy, caching, and automation can help ensure that a system can handle increasing demand without compromising performance.

Frequently asked questions in a system design interview about scalability, you may be asked the following questions:

How would you plan for horizontal and vertical scalability in your system?
How would you ensure high availability in your system? How would you handle scenarios of traffic spikes in your system?
How would you implement load balancing in your system?
How would you ensure data consistency in your system?
How would you handle latency in your system?
How would you improve throughput in your system?
How would you implement partitioning in your system?
How would you ensure redundancy in your system?
How would you implement caching to improve performance?

Other articles in the same series:
How to prepare for your next System Design interview in 2023

Scalability: key concepts and principles for a System design interview

Frequently asked questions in a system design interview about scalability, you may be asked the following questions:

Written by Anderson Santana