The Golden Rule of Cloud Architecture

Thomas Soares
Software Architecture in the Clouds
4 min readJun 17, 2019

…or at least my Golden Rule of Cloud Architecture. YMMV, of course. But if I had to pick just a single rule to live by when it comes to Cloud Architecture, this would be it:

Never Pay for Unused Capacity

Simple, right?

For the pedantic among us: yes, that was pithily said, but strictly speaking it wasn’t really expressed in the form of an architecture “rule”. We can reformulate it as, “Architect your system such that you never pay for unused capacity” or aligned with my previous definition of Software Architecture, “Constrain your project’s solution space such that you will never end up paying for unused capacity.” A more correct reformulation, perhaps, but definitely less pithy.

What does it mean? Exactly what is says. But restated in simple terms: don’t pay for stuff you’re not using. So, for example, you don’t want to be paying for instances that are sitting around idle, or even partially idle some of the time. This applies to managed services that have tiered billing as well — you don’t want to pay for a higher tier that gives you more capacity than you’re actually using. And it applies to pretty much any dimension that you can be billed for, whether CPU, or storage, or bandwidth. Don’t pay for more than you use.

Why would I make that my Golden Rule? Well, if I had to pick just a single rule to follow for cloud architecture, that would be it. It certainly doesn’t explicitly cover every possible eventuality, but if you did nothing other than rigorously follow it, I think you’d be in pretty good shape because it would indirectly help you to do The Right Thing in the vast majority of cases. Most importantly, it would help you keep a good handle on costs, which — as I’ve noted earlier — is important. Additionally, it is going to help you avoid “bad” architectural patterns, and choose “good” ones. So it is a good place to start.

It is important to note that avoiding “unused capacity” doesn’t always mean that you need to achieve 100% utilization. If you push a service to 100% utilization, you may experience performance degradation or be unable to accommodate increases in load. So “full utilization” may mean 85% or 90% or whatever is appropriate for the given use case. You might have a small bit of necessary unused capacity, and that’s okay.

It is also important to note that — like any rule — there are exceptions. Some cases where you might make an explicit decision to pay for unused capacity:

  • There is no possible alternative. If you don’t have a way to avoid the unused capacity, then so be it. But “possible” is tricky here. You may hear that it is not possible to avoid unused capacity given the current approach that is being used. While that may be true, if there exist other approaches that could eliminate the unused capacity then there actually are possible alternatives.
  • It is cheaper to have unused capacity. In other words, if you consider all (truly) possible alternatives and you find that they are all more expensive, then by all means, tolerate the unused capacity. There may be some cases where this happens, but I expect they are few and far between — so be cautious, and analyze carefully.
  • The cost of the unused capacity is negligible. If the cost of the unused capacity is just $5 or $10 per month, for example, and the cost and/or complexity of avoiding the unused capacity are high enough to make it undesirable to do so, then it isn’t a big deal — the ROI isn’t there. Just be cautious though, because a lot of small expenses can add up to a big one.

The good news is that cloud providers are offering a whole slew of services that make it possible to avoid unused capacity in most cases. But when I say “possible” here, I mean it in the absolute sense. I do not mean “always possible,” irrespective of the other architectural, design, and implementation decisions that you’ve made.

And therein lies the bad news. It isn’t necessarily easy to avoid unused capacity, and doing so may require you to use an entirely different approach than you’ve used in the past and are used to. The overall complexity of your solution is likely to increase as well. You have to be prepared to invest some blood, sweat, and tears in order to whittle away unused capacity. It is going to be a challenge, particularly if you’ve never looked at the problem from that perspective before.

But who ever said that living the Golden Rule is easy?

P.S. I’ve expanded on how to actually “Live the Golden Rule (of Cloud Architecture) in some additional posts:

  1. Horizontal Scaling
  2. Functions-as-a-Service (FaaS)
  3. Serverless Analytics
  4. Fractional CPUs & the AWS T-Series Instances

--

--