Vertical vs. Horizontal Scalability

Understanding that systems can scale vertically or horizontally, up or down, and out or in

Published in

Software Architecture/Design Essentials

3 min readDec 24, 2022

As defined in a previous post, scalability is the ability of a system to do more or bigger work with an increase in system resources (CPUs/cores, memory, IOPS, etc.) or units (nodes, machines, etc.) — usually, without material impacting performance. Scalable systems may be vertically scalable, which means they can scale up and down. Or, they may be horizontally scalable, which means they can scale out and in. Or, they may be both vertically and horizontally scalable. This article explains all of these concepts.

Vertical Scalability and Scale-Up/Down

A system is said to be vertically scalable if increasing its resources enables it to do more work. For example, if a Web application can serve a larger number of concurrent requests if hosted on a bigger server (physical machine, virtual machine, container, etc.), it is vertically scalable.

A vertically scalable system may be scaled up or down:

Scaling up refers to increasing the system’s capacity by increasing the resources. For instance, an increase in the underlying server’s CPUs/cores might enable a Web application to serve more concurrent requests.
Scaling down refers to decreasing capacity by decreasing the resources.

Horizontal Scalability and Scale-Out/In

A system is said to be horizontally scalable if increasing nodes/machines enables it to do more work. For example, in general, adding additional brokers to a Kafka cluster increases the cluster’s capacity, so it is horizontally scalable.

Horizontally scalable systems may scale out or in:

Scaling out refers to adding more nodes to increase capacity.
Scaling in refers to reducing nodes to decrease capacity.

Linear Scalability and Scalability Factor

A linearly scalable system is one whose capacity increases proportionately as its resources or units increase.

Say, the maximum throughput of a given server application is 1000 transactions/second (TPS). Assume that doubling the underlying host’s memory and CPU cores enables it to support twice the throughput - 2000 TPS. Similarly, tripling the resources enables it to serve 3000 TPS. We can then say that the system is linearly scalable in terms of throughput - at least between 1000-3000 TPS.

A linearly scalable system is said to have a scalability factor of 1. The scalability factor refers to the factor by which capacity increases with an increase in resources/units.

Many systems that exhibit vertical or horizontal scalability are not linearly scalable: they have a scalability factor of less than 1. In other words, they exhibit sub-linear scalability.

Even if a system does exhibit linear scalability, it may not do so indefinitely — there may be an upper limit (or a “ceiling”) on how much it can scale up/out. Most systems also have a lower limit (or a “floor”) on how much they can scale down/in.

Summary

A horizontally scalable system can be scaled up or scaled down by increasing or decreasing its resources (CPU, memory, etc.).

Similarly, a vertically scalable system can be scaled out or in by increasing or decreasing the number of units (nodes, machines, etc.).

The factor by which a system scales up/down or out/in as resources/nodes increase/decrease is its scalability factor. As far as I’ve seen, very few scalable systems exhibit linear scalability — most systems exhibit sub-linear scalability. Superlinear scalability is rare.