Living the Golden Rule (of Cloud Architecture) #1: Horizontal Scaling

Thomas Soares
Software Architecture in the Clouds
3 min readJun 20, 2019

Having defined my Golden Rule of Cloud Architecture, “Never Pay for Unused Capacity,” the obvious next step is to answer the question, “whaaa?”

I’ll do so by examining one practical approach that you can use to “live the Golden Rule”: horizontal scaling.

If you’re reading this blog, you either: (1) already know what horizontal scaling is, or (2) you have waaay too much time on your hands. Or both. Probably both. So I’ll spare you a definition of what horizontal scaling is.

But you may not have though of horizontal scaling as a means of controlling cost and limiting unused capacity in the Cloud— you may have viewed it primarily as a means of adaptively scaling to handle increasing workloads. And it definitely is that, for the front-end in particular where you have a group of instances behind a load balancer handling web requests, for example. But if you both increase and decrease the number of active instances you have in response to the current workload, then voilà, it can also be an effective means of limiting the amount you pay for unused capacity. Common sense, really — but maybe not the first thing that comes to mind when you think of horizontal scaling.

It can be a tricky approach to use though. There are several moving parts that can break or not work quite right, and it can be challenging to both increase and decrease capacity at just the right time to have enough capacity, but not too much. Using a larger number of smaller instances may help, but it is still tricky. So it isn’t necessarily one of the first approaches I would recommend for “living the Golden Rule.” If, however, you have an existing system in place that you’re trying to deal with, it may be an approach you can adopt without making fundamental changes. And that would be a start.

If your application does backend processing, you may find that it is easier to adopt horizontal scaling there. If you can bundle your workload into discrete tasks, or “jobs” that can be containerized in some way, then you can potentially make use of one of a wide range of technologies that allow you to submit jobs to a cluster of instances. Some of these technologies require you to provision a group of instances up front that you have to pay for whether they’re active or not — and that can defeat the purpose if you’re not careful. But you should be able to add and remove instances, and if you’re willing to queue up pending jobs for a little while you should be able to maintain pretty high utilization. Just keep in mind that choosing a technology that automatically scales to handle the workload you throw at it can make your life easier.

If you use horizontal scaling carefully you can realize many benefits, to include adhering to the Golden Rule by reducing the amount of unused capacity that you pay for (yes, the Golden Rule says never pay for unused capacity, but who amongst us is perfect?). It isn’t necessarily the most effective approach I would recommend, but it is perhaps more usable with existing systems than some of the other approaches I will talk about in the future. So it is worth keeping in mind.

--

--