Fundamentals of Scalability: Vertical and Horizontal Scaling

Published in

Design Microservices Architecture with Patterns & Principles

8 min readFeb 15, 2023

In this article, we are going to learn Fundamentals of Scalability: Vertical and Horizontal Scaling when designing any software architecture for our projects.

By this article, we are going to learn basics of Scalability, Vertical and Horizontal Scaling and How to scale our applications when designing our E-Commerce application.

Vertical Scaling — Horizontal Scaling of Application
Calculate How many concurrent request can accommodate our design ?
Load Balancer with Consistent Hashing

I have just published a new course — Design Microservices Architecture with Patterns & Principles.

Why we are learning Scalability ?

In the last article, we have learned and designed e-commerce application with Clean Architecture style. And discover Problems about that design regarding to accommodate concurrent request counts. Because Our E-Commerce Business is growing and Need to handle greater amount of request per second.

Problem: Increased Traffic, Handle More Request

Here you can see Problems and potential solutions of current design:

If you look at the table on image, you can see that we have handle only 2K concurrent request per second. But Our E-Commerce Business is growing and required to handle more amount of concurrent request at the same time and it should be acceptable latency for our users.

So How can we scale the application if we need to handle more users in our application ?

And the solutions are

Vertical and Horizontal Scaling
Scale Up and Scale Out
Load Balancer

And as a solution we have decided to learn and apply Scalability into our architecture.

Understand E-Commerce Domain: Non-Functional Requirements

Before go into Scalability, let me re-visit to Understand E-Commerce Domain: Non-Functional Requirements.

As you can see that No-matter which architecture type that you apply on your application, Scalability is one of the important N-FR for every applications that need to consider at Day 1. So that's why we put Scalability
at first point of our N-FR list for e-commerce application.

Introduction — Scalability

Scalability is the ability of a software application or system to handle increased load and maintain performance and responsiveness as the load increases. In other words, scalability is the ability of a system to continue to function effectively and efficiently as the number of users, requests, or data volumes grows.

Scalability is simply measured by the number of requests an application can handle successfully. It can be measured by the number of requests and it can effectively support simultaneously. Once the application can no longer handle any more simultaneous requests, it has reached its scalability limit. When your business to grow, in order to prevent downtime, and reduce latency, you must scale your resources accordingly. You can scale these resources through a combination of the network bandwidth, CPU and physical memory requirements, and hard disk adjustments.

Scaling an application is essential because as the application usage grows, the resources required to serve user requests may also increase. Without proper scalability, the application may experience performance issues, such as slow response times, crashes, or even downtime. This can lead to a poor user experience, lost revenue, and damage to the reputation of the application and the organization that develops and operates it.

There are two primary types of scalability: Horizontal and Vertical. Horizontal scaling and vertical scaling both involve adding resources to your computing infrastructure, you must decide which is right for your application. Scaling horizontally and scaling vertically are similar in that they both involve adding computing resources to your infrastructure.
There are distinct differences between the two in terms of implementation and performance.

Vertical Scaling — Scale up

Vertical Scaling is offers to increasing the resources of a single server or node, such as increasing the CPU, memory, or storage capacity. This approach is also known as scaling up, and it is commonly used for database systems and other types of software that require high performance.

Vertical scaling is basically makes the nodes stronger. If you have 1 server, make the server stronger with adding more hardware. Make optimization the hardware that will allow you to handle more requests. Vertical scaling keeps your existing infrastructure but adding more computing power. Your existing code doesn’t need to change — you simply need to run the same code on machines with better specs. Vertical scaling means adding more resources to a single node and adding additional CPU, RAM, and DISK to cope with an increasing workload.

Basically, vertical scaling gives you the ability to increase your current hardware or software capacity, but it’s important to keep in mind that you can only increase it to the limits of your server. By scaling up, you increase the capacity of a single machine. Vertical scaling allows data to live on a single node, and scaling spreads the load through CPU and RAM resources for your machines.

But the system has hardware limitations that you can increate CPU and RAM. Also this will highly expensive when you reach the maximum capacity. So we can call this Scalability Limits. Vertical scaling has Scalability Limit and you can reach the limit at some point that you will stuck on scalability limit.

And if you have get millions of request, in that case having one server won’t be enough, because even the hardware has maximum capacity limitations. In this case we need to do horizontal scaling or scaling out.

Horizontal Scaling — Scale out

Horizontal Scaling is offers to adding more servers or nodes to the system in order to increase its capacity to handle more traffic. This approach is also known as scaling out, and it is commonly used for web applications and distributed systems.

Horizontal scaling basically means splitting the load between different servers. Horizontal scaling simply adds more instances of machines without changing to existing specifications. By scaling out, you can share the processing power and load balancing across multiple machines.

Horizontal scaling means adding more machines to the resource pool, rather than simply adding resources by scaling vertically. Scaling horizontally gives you scalability but also reliability because you will have more redundancy and mostly its the preferred way to scale in distributed architectures.

When splitting into multiple servers, then we need take into consideration if you have a state or not. If your services are stateless, then its easy and the best practice to Scaling horizontally. We can just put the different services and have a load balancer that will split the load traffic to different servers. But if we have a state like database servers, than we need to manage more considerations like CAP theorem. In our e-commerce application, we will apply “Horizontal Scaling” in our architecture.

What is Load Balancer ?

Load balancers is really important topic to when you are in system design interviews. Especially which algorithms work when spiting requests equally like consistent hashing algorithms.

Basically, we use load balancers that balance the traffic across to all nodes of our applications. Mostly Load balancer is a software application that helps to spread the traffic across a cluster of servers to improve responsiveness and availability of the architecture.

Generally Load Balancer sits between the client and the server. And Load Balancer is accepting incoming network and application traffic and distributing the traffic across multiple backend servers using different algorithms. Mostly Load Balancer uses the consistent hashing algorithms. NGINX is one of the popular open-source load balancing software that widely using in the software industry.

Main features of Load Balancers should be fault tolerance and improves availability. That means if one of the backend server is down, all the traffic will be routed to the rest of the services accordingly from Load Balancer. Also if the traffic is growing rapidly, you only need to add more servers and the load balancer will route the traffic for you.

Use Load Balancer Splitting the Load with Consistent Hashing

Load balancers are using different kind of distribution algorithms to optimally distribute the loads. For example, Round robin algorithms — works as a First In First Out (FIFO). each server get requests in sequential order.

Consistent hashing is an algorithms for dividing up data between multiple machines. It works particularly well when the number of machines storing data may change. This makes it a useful trick for system design questions involving large distributed databases, which have many machines and must account for machine failure.

Consistent hashing solves the horizontal scalability problem by ensuring that every time we scale up or down, We don’t have to re-arrange all the keys or touch all the database servers. That's why Consistent hashing is the best option when working with distrusted microservices.

Design Clean Architecture — E-Commerce App

If we design e-commerce application with Clean architecture, you can see the image below:

There is a big single Monolithic Application Server but Application has Clean Architecture layers which’s are Domain, Application, Infrastructure and Web UI. And there is one big relational databases.

Design Microservice Architecture — E-Commerce App

If we design e-commerce application with Microservice architecture, you can see the image below:

Product microservice can use NoSQL document database Shopping Cart microservice can use NoSQL key-value pair database and Order microservice can use Relational database as per microservice data storage requirements.

What’s Next ?

Step by Step Design Architectures w/ Course