The scale cube is a useful visualization of a three-dimensional scalability model, shown in the figure below :
This cube is described in Martin Abbott and Michael Fisher’s excellent book, The Art of Scalability (Addison-Wesley, 2015).
The Scale cube defines 3 separate ways to scale an application: X, Y, and Z.
X-Axis scaling load balances requests across multiple instances. It is a common way to scale a monolithic application. Multiple instances of the application are run behind a load balancer. The load balancer distributes the requests among N identical instances of the application. This way of scaling improves the capacity and availability of the application.
Z-Axis scaling also runs multiple instances of the application, but here each instance only works on a subset of data. The data is partitioned amongst these N identical instances and load balancer distributes and routes the request by using a request attribute. An application might, for example, route requests using userId.
Y-Axis scaling functionally decomposes an application into services. (aka MicroServices).
X- and Z-axis scaling improves the application’s capacity and availability. But none of these approaches solve the problem of increasing development and application complexity.
Y-Axis scaling splits the application into multiple services. Each service performs a specific function. So, Y-Axis scaling decomposes a large monolithic application into small services, each of which can be scaled further by using X-Axis and Y-Axis scaling independently.